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Introduction 

The IDT79R4600 (R4600) and IDT79R4700 (R4700) support a wide 
variety of processor-based applications. Because of their low power 
consumption, coupled with high performance, they are well suited for a 
wide variety of embedded applications, including laser printers, 
X-terminals, internetworking equipment, imaging equipment, and high- 
end video games. The R4600 and R4700 are also well-suited to high- 
performance desktop applications such as Windows™ NT desktop and 
notebook systems, and 3-D workstations. 

Compatible with the 1DT79R4400PC family for both hardware and 
software, the R4600 and R4700 will serve in many of the same 
applications, but in addition support low-power operation for applications 
such as notebook computers. 

Floating Point 

The R4700 has improved FPA multiply operations. All other features of 
the R4700 are the same as those in the R4600. In this manual, these two 
products are referred to collectively as the R4600/R4700, except when 
information pertains only to one of them. In that situation they are 
referred to individually. 

Secondary Cache 

The R4600/R4700 does not provide integrated secondary cache and 
multiprocessor support as found in the R4000SC and R4000MC, but it is 
simple to build an external secondary cache. For most embedded 
applications, however, the large on-chip, two-way set associative caches 
make this unnecessary. 

Performance 

The R4600/R4700 brings R4000SC performance levels to the R4000PC 
package, while at the same time providing lower cost and lower power. It 
does this by providing larger on-chip caches that are two-way set 
associative, fewer pipeline stalls, and early restart for data cache misses. 
The result is higher performance than for an R4000 at the same frequency 
and for the same system latencies (exact figures are system dependent). 

Upward Compatibility 

The R4600/R4700 provides complete upward application-software 
compatibility with the IDT79R3000 family of microprocessors, including 
the IDT7 9R3000A and the IDT RISController™ family (lDT79R30xx family) 
as well the IDT79R4000 family of microprocessors. Microsoft 
Windows™ NT and UNISOFT Unix™ V.4 operating systems insure the 
availability of thousands of applications programs, geared to provide a 
complete solution to a large number of processing needs. An array of 
development tools facilitates the rapid development of R4600/R4700- 
based systems, enabling a wide variety of customers to take advantage of 
the MIPS Open Architecture philosophy. 

Together with the R4400, the R4600/R4700 provides a compatible, 
timely, and necessary evolution path from 32-bit to true, 64-bit 
computing. The original design objectives of the R4000 clearly mandated 
this evolution path; the result is a true 64-bit processor fully compatible 
with 32-bit operating systems and applications. 

The R4600/R4700 enables 32-bit applications to access 64-bit compute 
power painlessly. The software tools support a wide variety of models, 
including 32-bit address and data, 64-bit address and data, and 32-bit 
address/ 64-bit data. 32-bit address/data enables applications to be 
migrated without “cleaning up” some software. 
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The R4600/R4700 offers high-performance, large caches, and MMU and 
FPA functions to these systems. For desktop systems, the R4600/R4700 
supports a full migration to 64-bit, allowing 64-bit systems to execute true 
64-bit or older 32-bit applications. For embedded applications, the power 
and bandwidth of 64-bit data types can be used without the memory 
expansion of 64-bit addressing. 

The list on the following page summarizes the R4600/R4700 features. 
For a feature-by-feature comparison with the R4000, refer to the tables 
beginning on page 1-23. 
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Features 

• True 64-bit microprocessor 

- 64-bit integer operations 

- 64-bit floating-point operations 

- 64-bit registers 

- 64-bit virtual address space 




High-performance microprocessor 


- For R4600: 
For R4700: 

- For R4600: 
For R4700: 

- For R4600: 
For R4700: 


133 peak MIPS at 133MHz 
175 peak MIPS at 175MHz 
44 peak MFLOP/s at 133MHz 
87 peak MFLOP/s at 175MH 
109 SPECint92 and 83 SPECfp92 at 
132 SPECint92 and 94 SPECfp92 at 


Large two-way set associative caches on-chip 


150Mz 

175Mz 


• Improved FPA multiply performance (R4700 only) 
- 1 mul, 1 add every 4 clock cycles 


• High level of integration 

- 64-bit integer CPU 

- 64-bit floating-point unit 

- 16KB instruction cache; 16KB data cache 

- Flexible MMU with large TLB 


• Low-power operation 

- 3.3V or 5V power supply options 

- For R4600: 25mW/MHz internal power dissipation 

(2.5W @ 100MHz, 3.3V) 

For R4700: 24mW/MHz internal power dissipation 
(2.4W @ 100MHz, 3.3V) 

- Standby mode reduces internal power to 400mW 

• Fully software compatible with R4000 Processor Family 


• Standard operating system support includes: 

- Microsoft Windows NT 

- UNISOFT Unix™ System V.4 

- JMI C-executive 

- VX Works 

• Available in 179-pin PGA or 208-pin MQUAD 

• Input and output clock frequency: 

- Input clock at one-half pipeline frequency 

- Output clock is a programmable divisor of the pipeline frequency 

- Selectable bus frequency 

- Ratios of 1 /2. . . 1 /8 of pipeline rate 

• 64GB physical address space 

• Processor family for a wide variety of applications 

- Desktop workstations and PCs 

- Deskside or departmental servers 

- Routers 

- High-performance embedded applications 

- Notebooks 

• Large number of development tools, including: 

- Cross compilers 

- Logic models 

- Logic analyzer support 
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Device Overview 

The R4600/R4700 family brings a high-level of integration designed for 
high-performance and high-bandwidth computing. The key elements of 
the R4600/R4700 are briefly described below. An overview of these blocks 
is found here, with more detailed information on each block presented in 
subsequent chapters. 

Figure 1.1 shows a block level representation of the functional units 
within the R4600/R4700. 



Figure 1.1 R4600/R4700 Block Diagram 

Pipeline Overview 

The R4600/R4700 uses a 5-stage pipeline similar to the IDT79R3000. 
The simplicity of this pipeline allows the R4600/R4700 to be lower-cost 
and lower-power than super-scalar or super-pipelined processors. Unlike 
the R3000, the R4600/R4700 does virtual-to-physical translation in 
parallel with cache access. This allows the R4600/R4700 to operate at over 
twice the frequency of the R3000 and to support a larger TLB for address 
translation. 

Compared to the 8-stage R4000 pipeline, the R4600/R4700 is more 
efficient (requires fewer stalls). This is because the branch and load latency 
for the R4600/R4700 is shorter than for the R4000 (both are 2 cycles for 
the R4600/R4700 but are 3 and 4 cycles respectively for the R4000). 
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The internal pipeline of the R4600/R4700 processor operates at twice 
the frequency of the master clock, as discussed in Chapter 3. The 
processor achieves high throughput by pipelining cache accesses, 
shortening register access times, implementing virtual-indexed primary 
caches, and allowing the latency of certain functional units to span more 
than one pipeline clock cycles. 

Refer to Chapter 3 for a detailed discussion of the CPU pipeline 
operation, including descriptions of the delay instructions, interruptions 
to the pipeline flow caused by interlocks and exceptions, and the R4600/ 
R4700 implementation of a store buffer. Refer to Chapter 6 for a detailed 
discussion of the FPU pipeline. 

CPU Register Overview 

The R4600/R4700 has thirty-two general purpose registers. These 
registers are used for scalar integer operations and address calculation. 
The register file consists of two read ports and one write port, and is fully 
bypassed to minimize operation latency in the pipeline. 

Figure 1.2 shows the R4600/R4700 CPU registers. 


General Purpose Registers 
63 0 



Multiply and Divide Registers 


63 0 


1 

HI 


63 


cT 

L2 1 

63 

Program Counter 

0 

PC | 


Figure 1.2 R4600/R470O CPU Registers 


Two of the CPU general purpose registers have assigned functions: 

• rO is hardwired to a value of zero, and can be used as the target reg- 
ister for any instruction whose result is to be discarded. rO can also 
be used as a source when a zero value is needed. 

• r31 is used as an implicit return destination address register by the 
JAL and BAL series of instructions. 

The CPU has three special purpose registers: 

• PC — Program Counter register 

•HI — Multiply and Divide register higher result 

• LO — Multiply and Divide register lower result 
The two Multiply and Divide registers (HI, LO) store: 

• the product of integer multiply operations, or 

• the quotient (in LO) and remainder (in Hi) of integer divide operations. 
The R4600/R4700 processor has no Program Status Word (PSW) register 

as such: this is covered by the Status and Cause registers incorporated 
within the System Control Coprocessor (CPO). CPO registers are described 
later in this chapter. 
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CPU Instruction Set Overview 

Each CPU instruction is 32 bits long. As shown in Figure 1.3, there are 
three instruction formats: 

• immediate (I-type) 

• jump (J-type) 

• register (R-type) 


1-Type (Immediate) 


J-Type (Jump) 


R-Type (Register) 


31 26 25 21 20 16 15 0 


°p 

rs 

rt 

immediate || 

31 26 

25 


0 


target | 

31 26 

25 21 20 16 15 11 10 6 5 0 

[opl 

rs 

rt 

rd sa funct || 


Figure 1.3 CPU Instruction Formats 


Each format contains a number of different instructions, which are 
described further in this chapter. Fields of the instruction formats are 
described in Chapter 2. 

Instruction decoding is simplified by limiting the number of formats to 
these three. This limitation means that the more complicated (and less 
frequently used) operations and addressing modes can be synthesized by 
the compiler, using sequences of these same simple instructions. 

The instruction set can be further divided into the following groupings: 

• Load and Store instructions move data between memory and general 
registers. They are all immediate (I-type) instructions, since the only 
addressing mode supported is base register plus 16-bit, signed imme- 
diate offset. 

• Computational instructions perform arithmetic, logical, shift, multi- 
ply, and divide operations on values in registers. They include register 
(R-type, in which both the operands and the result are stored in reg- 
isters) and immediate (I-type, in which one operand is a 16-bit imme- 
diate value) formats. 

• Jump and Branch instructions change the control flow of a program. 
Jumps are always made to a paged, absolute address formed by com- 
bining a 26-bit target address with the high-order bits of the Program 
Counter (J-type format) or register address (R-type format). Branches 
have 16-bit offsets relative to the program counter (I-type). Jump And 
Link instructions save their return address in register 31. 

• Coprocessor instructions perform operations in the coprocessors. 
Coprocessor load and store instructions are I-type. 

• Coprocessor O (system coprocessor) instructions perform operations 
on CPO registers to control the memory management and exception 
handling facilities of the processor and the standby mode for power 
management. These are listed in Table 1.17. 

• Special instructions perform system calls and breakpoint operations. 
These instructions are always R-type. 

• Exception instructions cause a branch to the general exception-han- 
dling vector based upon the result of a comparison. These instruc- 
tions occur in both R-type (both the operands and the result are 
registers) and I-type (one operand is a 16-bit immediate value) for- 
mats. 
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Chapter 2 provides more detail about these instructions, and Appendix 
A gives a complete description of each. 

Table 1.1 through Table 1.16 list CPU instructions common to MIPS 
R-Series processors, along with the level in which they first appeared. The 
last column in each table refers to the MIPS ISA level in which the 
instruction first appeared. Table 1.17 lists CPO instructions. 


OpCode 

Description 

MIPS ISA Level 1 

LB 

Load Byte 

I 

LBU 

Load Byte Unsigned 

I 

LH 

Load Halfword 

I 

LHU 

Load Halfword Unsigned 

I 

LW 

Load Word 

I 

LWL 

Load Word Left 

I 

LWR 

Load Word Right 

I 

SB 

Store Byte 

I 

SH 

Store Halfword 

I 

SW 

Store Word 

I 

SWL 

Store Word Left 

I 

SWR 

Store Word Right 

I 

Note: 1 For Tables 1. 1 through 1.17 this column refers to the level in which the 
instruction first appeared. 


Table 1.1 CPU Instruction Set: Load and Store Instructions 


OpCode 

Description 

MIPS ISA Level 

ADDI 

Add Immediate 

I 

ADDIU 

Add Immediate Unsigned 

I 

sun 

Set on Less Than Immediate 

I 

SLT1U 

Set on Less Than Immediate 
Unsigned 

I 

ANDI 

AND Immediate 

I 

ORI 

OR Immediate 

I 

XORI 

Exclusive OR Immediate 

I 

LUI 

Load Upper Immediate 

I 


Table 1.2 CPU Instruction Set: Arithmetic Instructions (ALU Immediate) 
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OpCode 

Description 

MIPS ISA Level 

ADD 

Add 

I 

ADDU 

Add Unsigned 

I 

SUB 

Subtract 

I 

SUBU 

Subtract Unsigned 

I 

SLT 

Set on Less Than 

I 

SLTU 

Set on Less Than Unsigned 

I 

AND 

AND 

I 

OR 

OR 

I 

XOR 

Exclusive OR 

I 

NOR 

NOR 

I 


Table 1.3 CPU Instruction Set: Arithmetic (3-Operand, R-Type) 


OpCode 

Description 

MIPS ISA Level 

MULT 

Multiply 

I 

MULTU 

Multiply Unsigned 

I 

DIV 

Divide 

I 

DIVU 

Divide Unsigned 

I 

MFHI 

Move From HI 

I 

MTHI 

Move To HI 

I 

MFLO 

Move From LO 

I 

MTLO 

Move To LO 

I 


Table 1.4 CPU Instruction Set: Multiply and Divide Instructions 


OpCode 

Description 

MIPS ISA Level 

J 

Jump 

I 

JAL 

Jump And Link 

I 


Table 1.5 CPU Instruction Set: Jump and Branch Instruction 
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OpCode 

Description 

JR 

Jump Register 


MIPS ISA Level 


JALR Jump And Link Register 


BEQ Branch on Equal 


BNE Branch on Not Equal 


BLEZ Branch on Less Than or Equal to Zero 



BGTZ 

Branch on Greater Than Zero 

I 

BLTZ 

Branch on Less Than Zero 

I 

BGEZ 

Branch on Greater Than or Equal to Zero 

I 

BLTZAL 

Branch on Less Than Zero And Link 

I 

BGEZAL 

Branch on Greater Than or Equal to Zero 
And Link 

I 


Table 1.5 CPU Instruction Set: Jump and Branch Instruction 


OpCode 

Description 

MIPS ISA Level 

SLL 

Shift Left Logical 

I 

SRL 

Shift Right Logical 

I 

SRA 

Shift Right Arithmetic 

I 

SLLV 

Shift Left Logical Variable 

I 

SRLV 

Shift Right Logical Variable 

I 

SRAV 

Shift Right Arithmetic Variable 

I 

Table 1.6 CPU Instruction Set: Shift Instructions 

OpCode 

Description 

MIPS ISA Level 


Load Word to Coprocessor z 

I 


Store Word from Coprocessor z 

I 


Move To Coprocessor z 

I 


Move From Coprocessor z 

I 

CTCz 

Move Control to Coprocessor z 

I 

CFCz 

Move Control From Coprocessor z 

I 

COPz 

Coprocessor Operation z 

I 

BCzT 

Branch on Coprocessor z True 

I 

BCzF 

Branch on Coprocessor z False 

I 


Table 1.7 Instruction Set: Coprocessor Instructions 


OpCode 

Description 

MIPS ISA Level 

SYSCALL 

System Call 

I 

BREAK 

Break 

I 


Table 1.8 CPU Instruction Set: Special Instructions 
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OpCode 

Description 

MIPS ISA Level 

LD 

Load Doubleword 

III 

LDL 

Load Doubleword Left 

III 

LDR 

Load Doubleword Right 

III 

LL 

Load Linked 

II 

LLD 

Load Linked Doubleword 

III 

LWU 

Load Word Unsigned 

III 

SC 

Store Conditional 

II 

SCD 

Store Conditional Doubleword 

III 

SD 

Store Doubleword 

III 

SDL 

Store Doubleword Left 

III 

SDR 

Store Doubleword Right 

III 

SYNC 

Sync 

II 


Table 1.9 MIPS 2/MIPS 3 Additional: Load and Store Instructions 


OpCode 

Description 

MIPS ISA Level 

DADDI 

Doubleword Add Immediate 

III 

DADDIU 

Doubleword Add Immediate 
Unsigned 

III 


Table 1.10 MIPS 2/MIPS 3 Additional: Arithmetic Instructions (ALU Immediate) 


OpCode 

Description 

MIPS ISA Level 

DMULT 

Doubleword Multiply 

III 

DMULTU 

Doubleword Multiply Unsigned 

III 

DDIV 

Doubleword Divide 

III 

DDIVU 

Doubleword Divide Unsigned 

III 


Table 1.11 MIPS 2/MIPS 3 Additional: Multiply and Divide Instructions 
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OpCode 

Description 

MIPS ISA Level 

BEQL 

Branch on Equal Likely 

II 

BNEL 

Branch on Not Equal Likely 

II 

BLEZL 

Branch on Less Than or Equal to Zero 
Likely 

II 

BGTZL 

Branch on Greater Than Zero Likely 

II 

BLTZL 

Branch on Less Than Zero Likely 

II 

BGEZL 

Branch on Greater Than or Equal to Zero 
Likely 

II 

BLTZALL 

Branch on Less Than Zero And Link 

Likely 

II 

BGEZALL 

Branch on Greater Than or Equal to Zero 
And Link Likely 

II 

BCzTL 

Branch on Coprocessor z True Likely 

II 

BCzFL 

Branch on Coprocessor z False Likely 

II 


Table 1.12 MIPS 2/MIPS 3 Additional: Branch Instructions 


OpCode 

Description 

MIPS ISA Level 

DADD 

Doubleword Add 

III 

DADDU 

Doubleword Add Unsigned 

III 

DSUB 

Doubleword Subtract 

III 

DSUBU 

Doubleword Subtract Unsigned 

III 


Table 1.13 MIPS 2 /MIPS 3 Additional: Arithmetic Instructions 
(3-operand, R-type) 


OpCode 

Description 

MIPS ISA Level 

DSLL 

Doubleword Shift Left Logical 

III 

DSRL 

Doubleword Shift Right Logical 

III 

DSRA 

Doubleword Shift Right Arithmetic 

III 

DSLLV 

Doubleword Shift Left Logical Variable 

III 

DSRLV 

Doubleword Shift Right Logical Variable 

III 

DSRAV 

Doubleword Shift Right Arithmetic Variable 

III 

DSLL32 

Doubleword Shift Left Logical + 32 

III 

DSRL32 

Doubleword Shift Right Logical + 32 

III 

DSRA32 

Doubleword Shift Right Arithmetic + 32 

III 


Table 1.14 MIPS 2/MIPS 3 Additional: Shift Instructions 
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OpCode 

Description 

MIPS ISA Level 

TGE 

Trap if Greater Than or Equal 

II 

TGEU 

Trap if Greater Than or Equal Unsigned 

II 


Trap if Less Than 

II 

QQH 

Trap if Less Than Unsigned 

II 

on 

Trap if Equal 

II 

TNE 

Trap if Not Equal 

II 

TGEI 

Trap if Greater Than or Equal Immediate 

i 

II 

TGEIU 

Trap if Greater Than or Equal Immediate 
Unsigned 

II 

TLTI 

Trap if Less Than Immediate 

II 

TLTIU 

Trap if Less Than Immediate Unsigned 

i - - -■ . _ ... 

II 

TEQI 

Trap if Equal Immediate 

II 

TNEI 

Trap if Not Equal Immediate 

II 


Table 1.15 MIPS 2/MIPS 3 Additional: Exception Instructions 


OpCode 

Description 

MIPS ISA Level 

DMFCz 

Doubleword Move From Coprocessor z 

II 

DMTCz 

Doubleword Move To Coprocessor z 

II 

LDCz 

Load Double Coprocessor z 

II 

SDCz 

Store Double Coprocessor z 

II 


Table 1.16 MIPS 2/MIPS 3 Additional: Coprocessor Instructions 


OpCode 

Description 

MIPS ISA Level 

DMFCO 

Doubleword Move From CPO 

III 

DMTCO 

Doubleword Move To CPO 

III 

MTCO 

Move to CPO 

I 

MFCO 

Move from CPO 

I 

TLBR 

Read Indexed TLB Entiy 

I 

TLBWI 

Write Indexed TLB Entry 

I 

TLBWR 

Write Random TLB Entry 

I 

TLBP 

Probe TLB for Matching Entry 

I 

CACHE 

Cache Operation 


ERET 

Exception Return 

R4xxx only 

WAIT 

Enter Standby mode 

R4600 only 


Table 1.17 CPO Instructions 
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Data Formats and Addressing 

The R4600/R4700 processor uses four data formats: a 64-bit 
doubleword, a 32-bit word, a 16-bit halfword, and an 8-bit byte. Byte 
ordering within each of the larger data formats — halfword, word, 
doubleword — can be configured in either big-endian or little-endian order. 
Endianness refers to the location of byte 0 within the multi-byte data 
structure. Figures 1 .4 and 1.5 show the ordering of bytes within words and 
the ordering of words within multiple-word structures for the big-endian 
and little-endian conventions. 

When the R4000 processor is configured as a big-endian system, byte 0 
is the most-significant (leftmost) byte, thereby providing compatibility with 
MC 68000’ and IBM 370' conventions. Figure 1.4 shows this configuration. 
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Figure 1.4 Big-Endian Byte Ordering 


When configured as a little-endian system, byte 0 is always the least- 
significant (rightmost) byte, which is compatible with iAPX’ x86 and DEC 
VAX’ conventions. Figure 1.5 shows this configuration. 



In this text, bit 0 is always the least-significant (rightmost) bit; thus, bit 
designations are always little-endian (although no instructions explicitly 
designate bit positions within words). 

Figures 1.6 and 1.7 show little-endian and big-endian byte ordering in 
doublewords. 
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□□□□BEDE3 

Bits in a Byte 


Figure 1.7 Big-Endian Data in a Doubleword 


The CPU uses byte addressing for halfword, word, and doubleword 
accesses with the following alignment constraints: 

• Halfword accesses must be aligned on an even byte boundary 
(0,2,4...). 

• Word accesses must be aligned on a byte boundary divisible by four 
(0, 4, 8...). 

• Doubleword accesses must be aligned on a byte boundary divisible by 
eight (0, 8, 16...). 

The following special instructions load and store words that are not 
aligned on 4-byte (word) or 8-word (doubleword) boundaries: 

LWL LWR SWL SWR 

LDL LDR SDL SDR 

These instructions are used in pairs to provide addressing of misaligned 
words. Addressing misaligned data incurs one additional instruction cycle 
over that required for addressing aligned data. This extra cycle is because 
of an extra instruction for the “pair” (e.g., LWL and LWR form a pair). Also 
note that the CPU moves the unaligned data at the same rate as a 
hardware mechanism. 


1-14 








Overview 


Chapter 1 


Figures 1.8 and 1.9 show the access of a misaligned word that has byte 
address 3. 


Higher 

Address 

h 

Lower 

Address 


Bit# 

|~31 24 23 1615 8~7 61 


4 

5 

1 1 




3 


Figure 1.8 Big-Endian Misaligned Word Addressing 



Coprocessors (CP0-CP2) 

The MIPS ISA (MIPS III) for the R4600/R4700 (and R4000/R4400) 
defines three coprocessors (designated CPO through CP2): 

• Coprocessor 0 (CPO) is incorporated on the CPU chip and supports 
the virtual memoiy system and exception handling. CPO is also re- 
ferred to as the System Control Coprocessor. 

• Coprocessor 1 (CPI) is incorporated on the R4600/R4700, and imple- 
ments the MIPS floating-point instruction set. 

• Coprocessor 2 (CP2) is reserved for future use. 

CPO and CPI are described in the sections that follow. 

System Control Coprocessor, CPO 

CPO translates virtual addresses into physical addresses and manages 
exceptions and transitions between kernel, supervisor, and user states. 
CPO also controls the cache subsystem, as well as providing diagnostic 
control and error recovery facilities. 

CPO is also used to control the power management for the R4600/ 
R4700. This is the standby mode and it can be used to reduce the power 
consumption of the internal core of the CPU. The standby mode is entered 
by executing the WAIT instruction with the SysAD bus idle and is exited by 
any interrupt. This feature is discussed in Appendix G. 
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The CPO registers shown in Figure 1.10 and described in Table 1.18 on 
page 1.17 manipulate the memory management and exception handling 
capabilities of the CPU. 

Note: Access to reserved or undefined CPO register results are unde- 
fined. An exception may or may not result. 



Figure 1.10 R4600/R4700 CPO Registers 
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Description 


Index 


Random 


EntryLoO 


EntiyLol 



Programmable pointer Into TLB array 


Pseudorandom pointer into TLB array (read only) 


Low half of TLB entry for even virtual page (VPN) 


Low half of TLB entry for odd virtual page (VPN) 


Pointer to kernel virtual page table entry (PTE) for 32- 
bit address spaces 


TLB Page Mask 


Number of wired TLB entries 


Reserved 


Bad virtual address 


Timer Count 


High half of TLB entry 


Timer Compare 


Status register 


Cause of last exception 


Exception Program Counter 


Processor Revision Identifier 


Configuration register 


Load Linked Address 


Reserved 


Pointer to kernel virtual PTE table for 64-bit address 
spaces 


Reserved 


Secondary-cache error checking and correcting (ECC) 
and Primary parity 


Cache Error and Status register 


Cache Tag register 


Cache Tag register 


Error Exception Program Counter 


Reserved 


Table 1.18 System Control Coprocessor (CPO) Register Definitions 
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Floating-Point Co-Processor 

The R4600/R4700 incorporates an entire floating-point co-processor on 
chip, including a floating-point register file and execution units. The 
floating-point co-processor forms a “seamless” interface with the integer 
unit, decoding and executing instructions in parallel with the integer unit. 
The R4700 enhances the FPA implemented in the original R4600, resulting 
in an improved peak MFLOP rate. 

Floating-Point Units 

The R4600/R4700 floating-point execution units supports single and 
double precision arithmetic, as specified in the IEEE Standard 754. The 
execution unit is broken into a separate multiply unit and a combined 
add/convert/divide/square root unit. Overlap of multiplies and add/ 
subtract is supported. The multiplier is partially pipelined, allowing a new 
multiply to begin every 6 cycles for the R4600, and every 4 cycles for the 
R4700. 

As in the R3010 and R4000, the R4600/R4700 maintains fully precise 
floating-point exceptions while allowing both overlapped and pipelined 
operations. Precise exceptions are extremely important in mission-critical 
environments, such as ADA, and highly desirable for debugging in any 
environment. 

The floating-point unit’s operation set includes floating-point add, 
subtract, multiply, divide, square root, conversion between fixed-point and 
floating-point format, conversion among floating-point formats, and 
floating-point compare. These operations comply with the IEEE Standard 
754. 

Table 1.19 shows the latencies of some of the floating-point instructions 
in internal processor cycles. Due to pipelining, repeat rates may be higher. 
Also note that many operations are autonomous and can go in parallel. 


Operation 

Single Precision 

Double Precision 

ADD 

4 

4 

SUB 

4 

4 

MUL 

R4600: 8 
R4700: 4 

R4600: 8 

R4700: 5 

DIV 

32 

61 

SQRT 

31 

60 

CMP 

3 

3 

FIX 

4 

4 

FLOAT 

6 

6 

ABS 

1 

1 

MOV 

1 

1 

NEG 

1 

1 

LWC1, LDC1 

2 

2 

SWC 1 , SDC 1 

1 

1 


Table 1.19 Floating-Point Latency Cycles 
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Virtual to Physical Address Mapping 

The R4600/R4700 provides three modes of operation: 

• user mode 

• supervisor mode 

• kernel mode 

This mechanism is available to system software to provide a secure 
environment for user processes. Bits in a status register determine the 
mode of operation. In the user mode, the R4600/R4700 provides a single, 
uniform virtual address space of 256GB (2GB when Status.UX = 0). 

When operating in the kernel mode, four distinct virtual address spaces, 
totalling 1024GB (4GB when Status.KX = 0), are simultaneously available 
and are differentiated by the high-order bits of the virtual address. 

The R4600/R4700 processors also support a supervisor mode in which 
the virtual address space is 256.5GB (2.5GB when Stauts.SX = 0), divided 
into three regions based on the high-order bits of the virtual address. 

When the R4600/R4700 uses 64-bit virtual addresses, the address 
space layouts are an upward compatible extension of the 32-bit virtual 
address space layout. A detailed description of the addressing is given in 
Chapter 4. 

Joint TLB 

For fast virtual-to-physical address decoding, the R4600/R4700 uses a 
large, fully associative TLB which maps 96 Virtual pages to their 
corresponding physical addresses. The TLB is organized as 48 pairs of 
even-odd entries, and maps a virtual address and address space identifier 
into the large, 64GB physical address space. 

Two mechanisms are provided to assist in controlling the amount of 
mapped space, and the replacement characteristics of various memory 
regions. First, the page size can be configured, on a per-entry basis, to map 
a page size of 4KB to 16MB (in multiples of 4). A CPO register is loaded with 
the page size of a mapping, and that size is entered into the TLB when a 
new entry is written. Thus, operating systems can provide special purpose 
maps; for example, a typical frame buffer can be memory mapped using 
only one TLB entry. 

The second mechanism controls the replacement algorithm when a TLB 
miss occurs. The R4600/R4700 provides a random replacement algorithm 
to select a TLB entry to be written with a new mapping: however, the 
processor provides a mechanism whereby a system specific number of 
mappings can be locked into the TLB, and thus avoid being randomly 
replaced. This facilitates the design of real-time systems, by allowing 
deterministic access to critical software. 

The joint TLB also contains information to control the cache coherency 
protocol for each page. Specifically, each page has attribute bits to 
determine whether the coherency algorithm is: uncached, non-coherent 
write-back, non-coherent write-through write-allocate, non-coherent 
write-through no write-allocate, sharable, exclusive, or update. Non- 
coherent write-back is typically used for both code and data on the R4600/ 
R4700; the write-through modes support more efficient frame buffer 
accesses than the R4000 family. The coherent modes are supported for 
R4000 compatibility and generate different transaction types on the 
system interface: cache coherency is not supported however. 
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Instruction TLB 

The R4600/R4700 also Incorporates a 2-entry instruction TLB. Each 
entry maps a 4KB page. The instruction TLB improves performance by 
allowing instruction address translation to occur in parallel with data 
address translation. When a miss occurs on an instruction address 
translation, the least-recently used 1TLB entry is filled from the JTLB. The 
operation of the 1TLB is invisible to the user. 

Data TLB 

The R4600/R4700 also incorporates a 4-entry data TLB. Each entry 
maps a 4KB page. The data TLB improves performance by allowing data 
address translation to occur in parallel with data address translation. 
When a miss occurs on an data address translation, the DTLB is filled from 
the JTLB. The DTLB refill is pseudo-LRU: the least recently used entry of 
the least recently used half is filled. The operation of the DTLB is invisible 
to the user. 

Cache Memory 

In order to keep the R4600/R4700’s high-performance pipeline full and 
operating efficiently, the R4600/R4700 incorporates on-chip instruction 
and data caches that can be accessed in a single processor cycle. Each 
cache has its own 64-bit data path and can be accessed in parallel. The 
cache subsystem provides the integer and floating-point units with an 
aggregate bandwidth of 1.6GB per second at a system clock frequency of 
50MHz. 

Furthermore, the large, Two-way set associative caches increase 
emulation performance of DOS and Windows 3.1 applications when 
running under Windows NT. 

Instruction Cache 

The R4600/R4700 incorporates a two-way set associative on-chip 
instruction cache. This virtually indexed, physically tagged cache is 16KB 
in size and is protected with word parity. 

Because the cache is virtually indexed, the virtual-to-physical address 
translation occurs in parallel with the cache access, thus further 
increasing performance by allowing these two operations to occur 
simultaneously. The tag holds a 24-bit physical address and valid bit, and 
is parity protected. 

The instruction cache is 64-bits wide, and can be refilled or accessed in 
a single processor cycle. Instruction fetches require only 32 bits per cycle, 
for a peak instruction bandwidth of 700 MB/sec @ 175MHz. Sequential 
accesses take advantage of the 64-bit fetch to reduce power dissipation, 
and cache miss refill writes 64 bits per cycle to minimize the cache miss 
penalty. The line size is eight instructions (32 bytes) to maximize 
performance. 

Data Cache 

For fast, single cycle data access, the R4600/R4700 includes a 16KB on- 
chip data cache that is two-way set associative with a fixed 32 -byte (eight 
words) line size. Both the D-cache and the I-cache can be accessed each 
pipeline cycle; thus, the data bandwidth is 1400 MB/sec @ 175 MHz, in 
addition to the 700 MB/sec instruction bandwidth. 

The data cache is protected with byte parity and its tag is protected with 
a single parity bit. It is virtually indexed and physically tagged to allow 
simultaneous address translation and data cache access 
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The normal write policy is writeback, which means that a store to a cache 
line does not immediately cause memory to be updated. This increases 
system performance by reducing bus traffic and eliminating the bottleneck 
of waiting for each store operation to finish before issuing a subsequent 
memory operation. Software can however select write-through on a per- 
page basis when it is appropriate, such as for frame buffers. 

Associated with the Data Cache is the store buffer. When the R4600/ 
R4700 executes a Store instruction, this single-entry buffer gets written 
with the store data while the tag comparison is performed. If the tag 
matches, then the data is written into the Data Cache in the next cycle that 
the Data Cache is not accessed (the next non-load cycle). The store buffer 
allows the R4600/R4700 to execute a store every processor cycle and to 
perform back-to-back stores without penalty. 

Write buffer 

Writes to external memory, whether cache miss writebacks or stores to 
uncached or write-through addresses, use the on-chip write buffer. The 
write buffer holds up to four 64-bit address and data pairs or 1 cache line 
to be written back. The entire buffer is used for a data cache writeback and 
allows the processor to proceed in parallel with memory update. For 
uncached and write-through stores, the write buffer significantly increases 
performance over the R4000 family of processors. 

R4600/R4700 Clocks 

The R4600/R4700 has a number of clocks for the user. First, there is 
the pipeline clock, PClock. This clock is used for the pipeline and pipeline 
related functions internal to the R4600/R4700. It is two times the 
MasterClock frequency. The next clock is the system interface clock, 
SClock. This is also an internal clock and is used to sample data at the 
system interface and to clock data into the processor system interface 
output registers. The SClock is a divided version of the PClock. The divisor 
is selected at boot time. 

There are three external clocks. (Some outputs are replicated to minimize 
loading.) The MasterOut is at the same frequency as MasterClock and can 
be used to clock certain external logic. The other clocks are used by the 
external agent. These are the TClock, Transmit clock, and the RClock, 
Receive clock. The TClock is used to clock the output registers (signals 
transmitted to the R4600/R4700) of the external agent and is at the same 
frequency as SClock. The RClock is used to clock the input register (signals 
received from the R4600/R4700) of the external agent. It is also at the 
same frequency as the SClock but its phase leads the SClock and TClock 
by 25%. The R4600/R4700 implements an on-chip PLL to eliminate the 
effects of clock skew. 
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System Interface 

The R4600/R4700 supports a 64-bit system interface that is compatible 
with the R4000PC system interface. This interface operates from two 
clocks provided by the R4600/R4700, TClock[l:0] and RClock[l:0], at a 
division of the pipeline clock. 

The interface consists of a 64-bit Address/Data bus with 8 check bits 
and a 9-bit command bus. In addition, there are 8 handshake signals and 
6 interrupt inputs. The interface has a simple timing ‘specification and is 
capable of transferring data between the processor and memory at a peak 
rate of 400MB/sec at 50MHz. 

Figure 1.11 shows a typical system using the R4600/R4700. In this 
example there is DRAM, a boot EPROM and an optional secondary cache. 
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Comparison of R4600/R4700 and R4400 

This section compares features of the R4600/R4700 to the earlier R4400 
PC. Table 1.20 to Table 1.26 highlight some of the differences between the 
R4600/R4700 and the R4400 PC. This list is not exhaustive. 


Item 

R4400 PC 

R4600/R4700 

I/O 

R4400: TTL compatible 

RV4400: LV CMOS 

R4600/R4700: TTL-compatlble (5V±0.5%) 
RV4600/RV4700: LVCMOS (3.3V±0.3V) 

Package 

179-pin ceramic PGA 

same and 208-pin MQUAD 

JTAG 

yes 

no (serial out connected directly to serial in) 

Block transfer sizes 

16B or 32B 

32B 

Sclock divisor 

2, 3, 4. 6, 8 

2, 3, 4, 5, 6, 7, 8 

Non-block writes 

max throughput of 4 sclock cycles 

two new system interface protocol options 
that support 2 sclock cycle throughput 
(remains 4 in compatibility mode) 

Serial configuration 

as described in R4000 User's Guide 

different as described in Table 9.2 on 
page 9-7 

Address bits 63.. 56 on reads and 
writes 

zero 

bits 19.. 12 of virtual address 

Uncached and write-through 
stores 

uncached stores are buffered in 1- 
entry uncached store buffer (write 
through not possible) 

uncached and write-though stores buffered 
in 4-entry write buffer 

SysADC 

parity only 

same 

SysADC for non-data cycles 

parity 

zero 

SysCmdP 

parity 

zero 

Parity error during writeback 

use Cache Error exception 

output bad parity 

Error bit in data identifier of 
read responses 

Bus Error if error bit set for any dou- 
bleword 

only check error bit of first doubleword; all 
other error bits ignored 

Parity error on read data 

Bus Error if parity error in any dou- 
bleword 

bad parity written to cache; take Cache 

Error exception if bad parity occurs on dou- 
blewords that the processor is waiting for 

Block writes 

1-2 null cycles between address and 
data 

0 cycles between address and data 

Release after Read Request 

variable latency 

0 latency 

SysAD value for x cycles of write- 
back data pattern 

data bus undefined 

data bus maintains last D cycle value 

SysAD bus use after last D cycle 
of writeback 

data bus undefined 

trailing x cycles (e.g. DDxxDDxx, not 
DDxxDD) follow rule in entry immediately 
preceding 

Output slew rate 

dynamic feedback control 

simple CMOS output buffers with 2-bit 
static strength control 

IOOut output 

output slew rate control feedback 
loop output 

driven HIGH, do not connect 
(reserved for future output) 

IOIn input 

output slew rate control input 

should be driven high 
(reserved for future input) 

GrpRunB output 

do not connect 

same 

(reserved for future output) 

GrpStallB input 

should be connected to VCC 

same 

(reserved for future input) 

FaultB output pin 

indicates compare mismatch 

driven HIGH, do not connect 
(reserved for future output) 


Table 1.20 System Interface Comparison Between R4400 PC and R4600/R4700 
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Item 

R4400 PC 

R4600/R4700 

Cache Sizes 

16KB Instruction cache, 16KB Data 
cache 

16KB Instruction cache, 16KB Data 
cache 

Cache Line Sizes 

software selectable between 16B and 
32B 

fixed at 32B 

Cache Index 

vAddr 13 o 

vAddr 12 .. 0 

Cache Tag 

pAddr 35 12 

same 

Cache Organization 

direct mapped 

2 -way set associative 

Data cache write policy 

write-allocate and write-back 

write-allocate or not based on TLB 
entry, write-through or not based on 

TLB entry 

Data cache miss 

stall, output address, copy dirty data to 
writeback buffer, refill cache, output 
writeback data 

same, with FIFO to select the set to 
refill 

Data order for block 
reads 

sub-block ordering 

same 

Data order for block 
writes 

sequential 

same 

Instruction cache miss 
restart 

restart after all data received and writ- 
ten to cache 

same 

Data cache miss restart 

restart after all data received and writ- 
ten to cache 

restart on first doubleword, send sub- 
sequent doublewords to response 
buffer 

Instruction Tag 

2-bit cache state 

1-bit cache state 

Cache miss overhead 

5-8 cycles 

3 cycles 

Instruction cache parity 

1 parity bit per 8 data bits 

1 parity bit per 32 data bits 

Data cache parity 

1 parity bit per 8 data bits 

same 


Table 1.21 Cache Comparison Between R4400 PC and R4600/R4700 
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Item 

R4400 PC 

R4600/R4700 

Instruction virtual 
address translation 

2-entry ITLB 

same 

ITLB miss 

1 cycle penalty, refilled from JTLB, 

LRU replacement 

1 cycle on branch, jump, and ERET, 2 
cycles otherwise, refilled from JTLB, 

LRU replacement 

Data virtual address 
translation 

done directly in JTLB 

4-entry DTLB 

DTLB miss 

n.a. 

1 cycle penalty, refilled from JTLB, 
pseudo-LRU replacement 

JTLB 

48 entries of even/odd page pairs, fully 
associative 

same 

Page size 

4KB, 16KB, 16MB 

same 

Multiple entry match 
in JTLB 

sets TS in Status and disables TLB 
until Reset to prevent damage 

no damage for multiple match; no 
detection or shutdown implemented 

Virtual address size 

VSIZE = 40 

same 

Physical address size 

PSIZE = 36 

same 


Table 1.22 TLB Comparison Between R4400 PC and R4600/R4700 


Item 

R4400 PC 

R4600/R4700 

ALU latency 

1 cycle 

1 cycle 

Load latency 

3 cycles 

2 cycles 

Branch latency 

4 cycles (2 cycle penalty for taken 
branches) 

2 cycles (no penalty for taken 
branches) 

Store buffer (not write 
buffer) 

2 doublewords 

1 doubleword 

Integer multiply 

integer multiply hardware, 1 cycle to 
issue 

done in floating-point multiplier, 4 
cycles to issue 

Integer divide 

done in integer datapath adder, slips 
until done 

done in floating-point adder, 4 cycles to 
issue 

Integer multiply 

HIGH and LOW available at the same 
time 

LOW available one cycle before HIGH 

Integer divide 

HIGH and LOW available at the same 
time 

HIGH available one cycle before LOW 

HIGH and LOW hazards 

yes, HIGH and LOW written early in 
pipeline 

no, HIGH and LOW written after W 

MFHI/MFLO latency 

1 cycle 

2 cycles 

SLLV, SRLV, SRAV 

2 cycles to issue 

1 cycle to issue 

DSLL, DSRL, DSRA, 
DSLL32, DSRL32, 
DSRA32, DSLLV, 

DSRLV, DSRAV 

2 cycles to issue 

1 cycle to issue 


Table 1.23 Pipeline Comparison Between R4400 PC and R4600/R47O0 
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Item 

R4400 PC 

R4600/R4700 

WatchLo, WatchHi 

implemented 

unimplemented (no watch registers) 

Config 

as described in R4000 User's Guide 

subset 

Status 

as described in R4000 User's Guide , 
but RP not functional 

no TS or RP 

Low-power standby 
mode 

no 

WAIT instruction disables internal 
clock, freezing pipeline and other state; 
resume on interrupt 

MFCO/MTCO hazard 

only hazardous for certain cpO register 
combinations 

always hazardous — detected and 1- 
cycle slip inserted 

Entry LoO, EntryLol 

as described in R4000 User's Guide 

two new cache algorithms added to C 
field for non-coherent write-through 

TagLo, TagHi, ECC, 
CacheErr 

R4400SC bits implemented but mean- 
ingless 

Only bits meaningful on R4400 PC 
implemented 

TagLo 

as described in R4000 User's Guide 

bits 5.. 3 read/writeable but otherwise 
unused, bit 2 used for F bit 

Exceptions 

as described in R4000 User's Guide 
(VCEI and VCED not possible) 

VCEI, VCED, and WATCH exceptions 
not implemented 

Index CACHE ops 

I Fill CACHE op 

use vAddr 13 4 to select line 

use vAddr 13 to select set, vAddr 12 ..5 to 
select line of set 

Index Store Tag CACHE 
op 

Status.CE ignored 

TagLo.P stored if Status.CE set 

PRId 

Imp = 0x04 

R4600: Imp = 0x20 

R4700: Imp = 0x21 


Table 1.24 Coprocessor 0 Comparison Between R4400 PC and R4600/R4700 


Item 

R4400 PC 

R4600/R4700 

Possible exception stall 

only for operands that can cause 
exceptions 

some simplifications in detection hard- 
ware 

Floating-point divide 

separate divide unit 

done in floating-point adder 

Floating-point square 
root 

done in floating-point adder 

same 

Converts to/from 64-bit 
integer 

uses unimplemented for integer oper- 
ands/results with more than 53 bits of 
precision 

handles full 64-bit operands and 
results 

Floating-point registers 

Status. FR enables all 32 floating point 
registers 

same 

FCR0 

Imp = 0x05 

R4600: Imp = 0x20 

R4700: Imp = 0x21 


Table 1.25 Coprocessor 1 Comparison Between R4400 PC and R4600/R4700 
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Introduction 

This chapter is an overview of the central processing unit (CPU) 
instruction set; refer to Appendix A for detailed descriptions of individual 
CPU instructions. 

An overview of the floating-point unit (FPU) instruction set is in 
Chapter 6; refer to Appendix B for detailed descriptions of individual FPU 
instructions. 

CPU Instruction Formats 

Each CPU instruction consists of a single 32-bit word, aligned on a word 
boundary. There are three instruction formats — immediate (I-type), jump 
(J-type), and register (R-type) — as shown in Figure 2. 1. The use of a small 
number of instruction formats simplifies instruction decoding (thus higher 
frequency operations) and allowing the compiler to synthesize more 
complicated (and less frequently used) operations and addressing modes 
from these three formats as needed. 


1-Type (Immediate) 

31 26 25 21 20 16 15 


I OP 


rs 


rt 


immediate 


1 


J-Type (Jump) 

31 26 25 




^arget^ 


J 


R-Type (Register) 

31 26 25 21 20 16 15 1110 6 5 0 

funct fe 



rs 


rd 

sa 


Figure Legend: 

op 6-bit operation code 

rs 5-bit source register specifier 

rt 5-bit target (source/destination) register or branch condition 

immediate 16-bit immediate value, branch displacement or address 
displacement 

target 26-bit jump target address 

rd 5-bit destination register specifier 

sa 5-bit shift amount 

funct 6-bit function field 


Figure 2. 1 CPU Instruction Formats 


In the MIPS architecture, coprocessor instructions are implementation- 
dependent; refer to Appendix A for details of individual Coprocessor 0 
instructions. 
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Load and Store Instructions 

Load and store are immediate (I-type) instructions that move data 
between memory and the general registers. The only addressing mode that 
load and store instructions directly support is base register plus 16-bit 
signed immediate offset. 

Scheduling a Load Delay Slot 

A load instruction that does not allow its result to be used by the 
instruction immediately following is called a delayed load instruction. The 
instruction slot immediately following this delayed load instruction is 
referred to as the load delay slot. 

In the R4600/R4700 processor, the instruction immediately following a 
load instruction can request the contents of the loaded register, however, 
in such cases, hardware interlocks insert additional real cycles. 
Consequently, scheduling load delay slots can be desirable, both for 
performance and R-Series (e.g., R3051) processor compatibility. However, 
the scheduling of load delay slots is not absolutely required. 

Defining Access Types 

Access type indicates the size of an R4600/R4700 processor data item 
to be loaded or stored, set by the load or store instruction opcode. Access 
types are defined in Appendix A. 

Regardless of access type or byte ordering (endianness), the address 
given specifies the low-order byte in the addressed field. For a big-endian 
configuration, the low-order byte is the most-significant byte; for a little- 
endian configuration, the low-order byte is the least-significant byte. 

The access type, together with the three low-order bits of the address, 
define the bytes accessed within the addressed doubleword, which is 
shown in Table 2. 1 on page 2-3. 
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Only the combinations shown in Table 2.1 are permissible; other 
combinations cause address error exceptions. See Appendix A for 
individual descriptions of CPU load and store instructions. 



Table 2.1 Byte Access within a Doubleword 
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Computational Instructions 

Computational instructions can be either: 1) in register (R-type) format, 
in which both operands are registers, or 2) in immediate (I-type) format, in 
which one operand is a 16-bit immediate. 

Computational instructions perform the following operations on register 
values: 

• arithmetic 

• logical 

• shift 

• multiply 

• divide 

These operations fit in the following four categories of computational 
instructions: 

• ALU Immediate instructions 

• three-Operand Register-Type instructions 

• shift instructions 

• multiply and divide instructions 

64-bit Virtual Address Operations with 32-bit operands 

Operands to 32-bit operand opcodes must be in sign-extended form. 32- 
bit operand opcodes include all non-doubleword operations, such as: ADD, 
ADDU, SUB, SUBU, ADDI, SLL, SRL, SRA, SLLV, etc. The result of 
operations that use incorrect sign-extended 32-bit values is unpredictable. 

Cycle Timing for Multiply and Divide Instructions 

MFHI and MFLO instructions (described in Appendix A) are interlocked 
so that any attempt to read them before prior multiply or divide 
instructions complete delays the execution of these instructions until the 
prior instructions finish. 

Table 2.2 gives the number of processor cycles (PCycles) required to 
resolve an interlock or stall between various multiply or divide 
instructions, and a subsequent MFHI or MFLO instruction. 


Instruction 

R4600 

R4700 

MULT 

10 

8 

MULTU 

10 

8 

DIV 

42 

42 

DIVU 

42 

42 

DMULT 

12 

10 

DMULTU 

12 

10 

DDIV 

74 

74 

DDIVU 

74 

74 


Table 2.2 Multiply/Divide Instruction Cycle Timing 


For more information about computational instructions, refer to the 
individual instruction as described in Appendix A. 


2-4 





























CPU Instruction Set Summary 


Chapter 2 


Jump and Branch Instructions 

Jump and branch Instructions change the control flow of a program. All 
jump and branch instructions occur with a delay of one instruction: that 
is, the instruction immediately following the jump or branch (this is known 
as the instruction in the delay slot) always executes while the target 
instruction is being fetched from storage. 

Overview of Jump Instructions 

Subroutine calls in high-level languages are usually implemented with 
Jump or Jump and Link instructions, both of which are J-type 
instructions. In J-type format, the 26-bit target address shifts left 2 bits 
and combines with the high-order 4 bits of the current program counter to 
form an absolute address. 

Returns, dispatches, and large cross-page jumps are usually 
implemented with the Jump Register or Jump and Link Register 
instructions. Both are R-type instructions that take the 32-bit or 64-bit 
byte address contained in one of the general purpose registers. 

For more information about jump instructions, refer to the individual 
instruction as described in Appendix A. 

Overview of Branch Instructions 

All branch instruction target addresses are computed by adding the 
address of the instruction in the delay slot to the 16-bit offset (shifts left 
2 bits and is sign-extended to 32 bits). All branches occur with a delay of 
one instruction. 

If a conditional branch likely is not taken, the instruction in the delay 
slot is nullified. For regular conditional branches, the delay slot is always 
executed. 

For more information about branch instructions, refer to the individual 
instruction as described in Appendix A. 

Special Instructions 

Special instructions allow the software to initiate traps; they are always 
R-type. For more information about special instructions, refer to the 
individual instruction as described in Appendix A. 

Exception Instructions 

Exception instructions are extensions to the MIPS ISA. For more 
information about exception instructions, refer to the individual 
instruction as described in Appendix A. 

Coprocessor Instructions 

Coprocessor instructions perform operations in their respective 
coprocessors. Coprocessor loads and stores are I-type, and coprocessor 
computational instructions have coprocessor-dependent formats. 

Individual coprocessor instructions are described in Appendices A (for 
CPO) and B (for the FPU, CPI). 

CPO instructions perform operations specifically on the System Control 
Coprocessor registers to manipulate the memory management and 
exception handling facilities of the processor. Appendix A contains details 
of the CPO instructions. 
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Introduction 

This chapter describes the basic operation of the CPU pipeline, which 
includes descriptions of the delay instructions (instructions that follow a 
branch or load instruction in the pipeline), interruptions to the pipeline 
flow caused by interlocks and exceptions, and R4600/R4700 
implementation of an uncached store buffer. The FPU pipeline is described 
in a later chapter. 

CPU Pipeline Operation 

The R4600/R4700 uses a 5-stage pipeline similar to the R3000. The 
simplicity of this pipeline allows the R4600/R4700 to be lower cost and 
lower power than super-scalar or super-pipelined processors. Unlike the 
R3000, the R4600/R4700 does virtual to physical translation in parallel 
with cache access. This allows the R4600/R4700 to operate at over twice 
the frequency of the R3000 and to support a larger TLB for address 
translation. 

Compared to the 8-stage R4000 pipeline, the R4600/R4700 is more 
efficient (requires fewer stalls). 

Once the pipeline has been filled, five instructions are executed 
simultaneously. Figure 3.1 shows the five stages of the instruction 
pipeline; the next section describes the pipeline stages. 


11 

21 

1 R 

2R 

1 A 

2A 

ID 

2D 

1W 

2W 

















ii 

21 

1R 

2R 

1 A 

2A 

■ 


1W 

2W 



one cycle 

► 

Figure Legend 


1 1-1 R 

Instruction cache access 

2R 

Instruction decode 

21 

Instruction virtual to physical address translation in ITLB 

1A-2A 

Integer add, logical, shift 

2A-2D 

Data cache access and load align 

1A 

Data virtual address calculation 

ID 

Data virtual to physical address translation in DTLB 

2A 

Store align 

1D-2D 

Virtual to physical address translation in JTLB 

1A 

Branch decision 

2R 

Register file read 

2W 

Register file write 

2R 

Bypass calculation 




Figure 3. 1 Instruction Pipeline Stages 
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CPU Pipeline Stages 

This section describes each of the phases of the five pipeline stages. 
Each stage has 2 phases: 

• II - Instruction Fetch, Phase one 

• 21 - Instruction Fetch, Phase two 

• 1R - Register Fetch, Phase one 

• 2R - Register Fetch, Phase two 

• 1A - Execution, Phase one 

• 2A - Execution, Phase two 

• ID - Data Fetch, Phase one 

• 2D - Data Fetch, Phase two 

• 1W - Write Back, Phase one 

• 2W - Write Back, Phase two 

II - Instruction Fetch, Phase one 

During the II phase the instruction address translation begins in the 
ITLB. 

21 - Instruction Fetch, Phase two 

During the 21 phase, the instruction cache fetch begins and the 
instruction address translation in the ITLB continues. 

1R - Register Fetch, Phase one 

During the 1R phase, the following occurs: 

• The instruction cache fetch finishes. 

• The instruction cache tag is checked against the page frame number 
obtained from the ITLB. 

2R - Register Fetch, Phase two 

During the 2R phase, the following occurs: 

• The instruction decoder decodes the instruction. 

• Any required operands are fetched from the register file. 

• Make a decision to either issue or slip (for an interlock condition). 

• For a branch, the branch address is calculated. 

IA - Execution, Phase one 

During the 1A phase, one of the following occurs: 

• Any result from the A or D stages are bypassed. 

• The arithmetic logic unit (ALU) starts the integer arithmetic, logical or 
shift operation. 

• The ALU calculates the data virtual address for load and store in- 
structions. 

• The ALU determines whether the branch condition is true. 

2A - Execution, Phase two 

During the 2A phase, one of the following occurs: 

• The integer arithmetic, logical or shift operation will complete. 

• A data cache access will start. 

• Store data is shifted to the specified byte position(s). 

• The data virtual to physical address translation in the DTLB will start. 

ID - Data Fetch, Phase one 

During the ID phase, one of the following occurs: 

• The data cache access will continue. 

• The data address translation in the DTLB completes. 

• The virtual to physical address translation in the JTLB will start. 
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2D - Data Fetch, Phase two 

During the 2D phase, one of the following occurs: 

• The data cache access will finish and the data is shifted down and ex- 
tended. 

• The virtual to physical address translation in the JTLB will finish. 
The data cache tag is checked against the PFN from the DTLB or JTLB 
for any data cache access. 

1W - Write Back, Phase one 

This phase is used internally by the processor to resolve all exceptions, 
in preparation for the register file write. 

2W - Write Back, Phase two 

For register-to-register and load instructions, the result is written back 
to the register file during the 2W stage. Branch instructions perform no 
operation during this stage. 

Figure 3.2 shows the activities occurring during each ALU pipeline 
stage, for load, store, and branch instructions. 
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Figure 3.2 CPU Pipeline Activities 
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Branch Delay 

The CPU pipeline has a branch delay of one cycle and a load delay of one 
cycle. The one-cycle branch delay is a result of the branch decision logic 
operating during the 1A pipeline phase of the branch instruction. This 
allows the branch target address calculated in the previous phase to be 
used for the instruction access in the following II phase. The pipeline will 
begin the fetch of the branch path as well as the fall-through path in the 
cycle following the delay slot. After the branch decision is made, the 
processor will continue with the fetch of either the branch path (for a taken 
branch) or the fall-through path (for the non-taken branch). 

Figure 3.3 illustrates the branch delay. 
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Figure 3.3 CPU Pipeline Branch Delay 


Load Delay 

The completion of a load at the end of the 2D pipeline phase produces 
an operand that is available for the 1A pipeline phase of the instruction 
following the load delay slot. 

Figure 3.4 shows the load delay of one pipeline cycle. 
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Figure 3.4 CPU Pipeline Load Delay 
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Interlock and Exception Handling 

Smooth pipeline flow is interrupted when cache misses or exceptions 
occur, or when data dependencies are detected. Interruptions handled 
using hardware, such as cache misses, are referred to as interlocks, while 
those that are handled using software are called exceptions. 

There are two types of interlocks: 

• stalls, which are resolved by halting the pipeline 

• slips, which require the back end of the pipeline to advance while the 
front end of the pipeline is held static 

At each cycle, exception and interlock conditions are checked for all 
active instructions. 

Because each exception or interlock condition corresponds to a 
particular pipeline stage, a condition can be traced back to the particular 
instruction in the exception/interlock stage, as shown in Figure 3.5. For 
instance, a Reserved Instruction (RI) exception is raised in the execution 
(A) stage. 
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Figure 3.5 Correspondence of Pipeline Stage to Interlock Condition 

For a description of the pipeline interlocks and exceptions listed in 
Figure 3.5, refer to Table 3. 1 and Table 3.2, which follow. 
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Table 3. 1 and Table 3.2 describe the pipeline interlocks and exceptions 
listed in Figure 3.5. 


Exception 

Description 

ITLB 

Instruction Translation or Address Exception 

Intr 

External Interrupt 

IBE 

Instruction Bus Error 

RI 

Reserved Instruction 

BP 

Breakpoint 

SC 

System Call 

CUn 

Coprocessor Unusable 

IPErr 

Instruction Parity Error 

OVF 

Integer Overflow 

FPE 

FP Interrupt 

ExTrap 

EX Stage Traps 

DTLB 

Data Translation or Address Exception 

TLBMod 

TLB Modified 

DBE 

Data Bus Error 

DPErr 

Data Parity Error 

NMI 

Non-maskable Interrupt (or Soft Reset) 

Reset 

Reset 


Table 3. 1 Pipeline Exceptions 


Interlock 

Description 

ITM 

Instruction TLB Miss 

ICM 

Instruction Cache Miss 

CPE 

Coprocessor Possible Exception 

DCM 

Data Cache Miss 

LDI 

Load Interlock 

MDSt 

Multiply/Divide Start 

FCBsy 

FP Coprocessor Busy 


Table 3.2 Pipeline Interlocks 


Exception Conditions 

When an exception condition occurs, the relevant instruction and all 
those that follow it in the pipeline are cancelled. Accordingly, any stall 
conditions and any later exception conditions that may have referenced 
this instruction are inhibited; there is no benefit in servicing stalls for a 
cancelled instruction. 


3-6 



















































The CPU Pipeline 


Chapter 3 


When an exceptional condition Is detected for an Instruction, the 
R4600/R4700 will kill it and all following Instructions. When this 
instruction reaches the W stage, the exception flag causes it to write 
various CPO registers with the exception state, change the current PC to 
the appropriate exception vector address and clear the exception bits of 
earlier pipeline stages. 

This implementation allows all preceding instructions to complete 
execution and prevents all subsequent instructions from completing. Thus 
the value in the EPC is sufficient to restart execution. It also ensures that 
exceptions are taken in the order of execution; an instruction taking an 
exception may itself be killed by an instruction further down the pipeline 
that takes an exception in a later cycle. 

Figure 3.6 shows the exception detection procedure (e.g., a reserved 
instruction exception). 



Exception Vector Address 


Figure 3.6 Exception Detection 


Stall Conditions 

Stalls are used to stop the pipeline for conditions detected after the R 
pipe-stage. When a stall occurs, the processor will resolve the condition 
and then the pipeline will continue. 
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Figure 3.7 shows a data cache miss stall. 
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Figure 3.7 Data Cache Miss 


The data cache miss is detected in the D pipe stage. If the cache line to 
be replaced is dirty — the W bit is set — the data is moved to the internal 
write buffer in the next cycle. The first doubleword of data is returned to 
the cache in 3 and the pipeline will then restart. The remainder of the 
cache line is returned in the subsequent cycles. The data to be written 
back will be returned to memory some time after the entire new cache line 
is returned. 


Slip Conditions 

During the 2R and 1A pipe-stages, internal logic will determine whether 
it is possible to start the current instruction in this cycle. If all of the source 
operands are available (either from the register file or via the internal 
bypass logic) and all the hardware resources necessary to complete the 
instruction will be available at the necessary time(s), then the instruction 
“issues”; otherwise, the instruction will "slip”. Slipped instructions are 
retried on subsequent cycles until they issue. The backend of the pipeline 
(stages D and W) will advance normally during slips in an attempt to 
resolve the conflict. “NOPS” will be inserted into the bubble in the pipeline. 
Instructions killed by branch likely instructions, ERET or exceptions will 
not cause slips. 
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Figure 3.8 shows an instruction cache miss. 




Detect Cache Miss 

Get entire cache line into cache 

Continue pipeline 
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Figure 3.8 Instruction cache miss 

Instruction cache misses are detected in R as shown in Figure 3.8 and 
the pipeline slips in its A stage. There can never be a writeback required 
for an instruction cache miss since dirty data can never exist in the I 
cache. Writes are not allowed to the I cache. Note that early restart is not 
employed for instruction cache misses, the requested cache line will be 
loaded into the cache in its entirety and, after that, the pipeline will restart. 

R4600/R4700 Write Buffer 

The R4600/R4700 contains a write buffer to improve the performance 
of writes to the external memory. Writes to external memory, whether 
cache miss writebacks or stores to uncached or write-through addresses, 
use this on-chip write buffer. The write buffer holds up to four 64-bit 
address and data pairs. 

For a cache miss write-back, the entire buffer is used for the write-back 
data and allows the processor to proceed in parallel with the memory 
update. For uncached and write-through stores, the write buffer 
uncouples the CPU from the write to memory allowing increased 
performance over the R4000 family of processors. If the write buffer is full, 
additional stores will stall until there is room for them in the write buffer. 
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The R4600/R4700 processor provides a full-featured memory 
management unit (MMU) which uses an on-chlp Translation Lookaside 
Buffer (TLB) to translate virtual addresses into physical addresses. 

This chapter describes the processor virtual and physical address 
spaces, the virtual-to-physical address translation, the operation of the 
TLB in making these translations, and those System Control Coprocessor 
(CPO) registers that provide the software interface to the TLB. 

Translation Lookaside Buffer (TLB) 

Mapped virtual addresses are translated into physical addresses using 
an on-chip TLB. 1 The TLB is a fully associative memory that holds 48 
entries, which provide mapping to 48 odd/even page pairs (96 pages). 
When address mapping is indicated, each TLB entry is checked 
simultaneously for a match with the virtual address that is extended with 
an ASID stored in the EntryHi register. 

The address mapped to a page ranges in size from 4Kbytes to 16Mbytes, 
in multiples of 4— that is, 4K, 16K, 64K, 256K, 1M, 4M, 16M. 

Hits and Misses 

If there is a virtual address match, or hit, in the TLB, the physical page 
number is extracted from the TLB and concatenated with the offset to form 
the physical address (see Figure 4.1). 

If no match occurs (TLB miss), an exception is taken and software refills 
the TLB from the page table resident in memory. Software can write over 
a selected TLB entry or use a hardware mechanism to write into a random 
entry. 

Multiple Matches 

The R4600/R4700 does not provide any detection or shutdown 
mechanism for multiple matches in the TLB. There is no damage possible 
from this condition. The result is undefined for this condition. Software is 
expected never to allow this to occur. 

Address Spaces 

This section describes the virtual and physical address spaces and the 
manner in which virtual addresses are converted or “translated” into 
physical addresses in the TLB. 

Virtual Address Space 

The processor virtual address can be either 32- or 64-bits wide, 
depending on mode of operation (user, supervisor or kernel) and the 
setting of the corresponding extended address bit in the Status register 
(UX, SX and KX). 

• For the extended address bit = 0, addresses are 32-bits wide. 

• For the extended address bit = 1, addresses are 64-bits wide. 

Both 32-bit and 64-bit address wrap in the same way. For example, in 
64-bit mode Oxffffffffffffffff will wrap to 0x0000000000000000. While the 
R4400 slipped on shift of >32-bit or other shift variables, the R4600/ 
R4700 does not. 


1 • There are virtual-to-physical address translations that occur outside of the TLB . 
For example, addresses in ksegO and ksegl spaces are unmapped translations. In 
these spaces the physical address is 0x0000 0000 0 II VA[28:0] 
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Figure 4.1 shows the translation of a virtual address into a physical 
address. 


Virtual address 



Physical address 


Figure 4. 1 Overview of a Virtual-to-Physical Address Translation 

As shown in Figure 4.2 and Figure 4.3, the virtual address is extended 
with an 8-bit address space identifier (ASID), which reduces the frequency 
of TLB flushing when switching contexts. This 8-bit ASID is in the CTO 
EntryHi register, described later in this chapter. The Global bit (G) is in the 
EntryLoO and EntryLol registers, described later in this chapter. 

Physical Address Space 

Using a 36-bit address, the processor physical address space 
encompasses 64Gigabytes. The section following describes the translation 
of a virtual address to a physical address. 

Virtual-to-Physical Address Translation 

Converting a virtual address to a physical address begins by comparing 
the virtual address from the processor with the virtual address in the TLB; 
there is a match when the virtual page number (VPN) of the address is the 
same as the VPN field of the entry, and either: 

• the Global (G) bit of the TLB entry is set, or 

• the ASID field of the virtual address is the same as the ASID field of 
the TLB entiy. 

This match is referred to as a TLB hit. If there is no match, a TLB Miss 
exception is taken by the processor and software is allowed to refill the TLB 
from a page table of virtual/physical addresses in memoiy. 

If there is a virtual address match in the TLB, the physical address is 
output from the TLB and concatenated with the Offset, which represents 
an address within the page frame space. The Offset does not pass through 
the TLB. 

Virtual-to-physical translation is described in greater detail throughout 
the remainder of this chapter; Figure 4. 19 on page 22 is a flow diagram of 
the process. 

The next two sections describe the 32-bit and 64-bit address 
translations. 
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32-bit Virtual Address Translation 

Figure 4.2 shows the virtual-to-physical-address translation of a 32-bit 
virtual address. 

• The top portion of Figure 4.2 shows a virtual address with a 12-bit, or 
4Kbyte, page size, labelled Offset The remaining 20 bits of the ad- 
dress represent the VPN, and index the lM-entry page table. 

• The bottom portion of Figure 4.2 shows a virtual address with a 24- 
bit, or 16Mbyte, page size, labelled Offset. The remaining 8 bits of the 
address represent the VPN, and index the 256-entry page table. 


Virtual Address with 1M (2 20 ) 4-Kbyte pages 
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Figure 4.2 32-bit Virtual Address Translation 


64-bit Virtual Address Translation 

Figure 4.3 on page 4 shows the virtual-to-physical-address translation 
of a 64-bit virtual address. This figure illustrates the two extremes in the 
range of possible page sizes: a 4Kbyte page (12 bits) and a 16Mbyte page 
(24 bits). 

• The top portion of Figure 4.3 shows a virtual address with a 
12-bit, or 4Kbyte, page size, labelled Offset The remaining 28 bits of 
the address represent the VPN, and index the 256M-entry page table. 

• The bottom portion of Figure 4.3 shows a virtual address with a 24- 
bit, or 16Mbyte, page size, labelled Offset The remaining 16 bits of 
the address represent the VPN, and index the 64K-entry page table. 
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Virtual Address with 256M (2 28 ) 4-Kbyte pages 
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Figure 4.3 64-bit Virtual Address Translation 


Operating Modes 

The processor has three operating modes that function In both 32- and 
64-bit operations: 

• User mode 

• Supervisor mode 

• Kernel mode 

These modes are described in the next three sections. 


User Mode Operations 

In User mode, a single, uniform virtual address space — labelled User 
segment — is available; its size is: 

• 2 Gbytes (2 31 bytes) for Status.UX = 0 ( useg i) 

• 1 Tbyte (2 40 bytes) for Status.UX = 1 (xuseg) 
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Figure 4.4 shows the User mode virtual address space. 


32-bit* 64-bit 


Ox FFFF FFFF 


Ox FFFF FFFF FFFF FFFF 


Ox 8000 0000 

Address 

Error 

Ox 0000 0100 0000 0000 

Address 

Error 

Ox 0000 0000 

2 GB 
Mapped 
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Note: *For 32-bit virtual addresses, bit 31 is sign-extended through bits 63:32. 
Failure (i.e., bit 31 = 1) results in an Address Error exception. 


Figure 4.4 User Mode Virtual Address Space 


The User segment starts at address 0 and the current active user 
process resides in either useg (32-bit virtual addressing) or xuseg (in 64- 
bit virtual addressing). The TLB identically maps all references to useg/ 
xuseg from all modes, and controls cache accessibility. 

The processor operates in User mode when the Status register contains 
the following bit-values: 

• KSU bits = 10 2 

• EXL = 0 

• ERL — 0 

In conjunction with these bits, the UX bit in the Status register selects 
between 32- or 64-bit User virtual addressing as follows: 

• when UX = 0, 32-bit useg space is selected 

• when UX = 1, 64-bit xuseg space is selected 

Table 4. 1 lists the characteristics of the two user mode segments, useg 
and xuseg. 


Address Bit 
Values 

Status Register 

Bit Values 

Segment 

Name 

Address Range 

Segment Size 

KSU 

EXL 

ERL 

UX 

32-bit 

A(31) = 0 

10 2 


0 

0 

useg 

0x0000 0000 
through 

0x7FFF FFFF 

2 Gbyte 
(2 31 bytes) 

64-bit 

A(63:40) = 0 

10 2 

0 

0 

1 

xuseg 

0x0000 0000 0000 0000 
through 

0x0000 00FF FFFF FFFF 

1 Tbyte 
(2 40 bytes) 


Table 4. 1 32-bit and 64-bit User Mode Segments 


32-bit User Mode ( useg ) 

In User mode, when Status.UX = 0, User mode virtual addressing is 
compatible with the 32-bit addressing model shown in Figure 4.4, and a 2- 
Gbyte user address space is available, labelled useg. 
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All valid User mode virtual addresses have their most-significant bit 
cleared to 0; any attempt to reference an address with the most-significant 
bit set while in User mode causes an Address Error exception. 

In 32-bit User mode virtual addressing, the TLB refill exception vector is 
used for TLB misses. 

The system maps all references to useg through the TLB, and bit 
settings within the TLB entry for the page determine the cacheability of a 
reference. 

64-bit User Mode (. xuseg ) 

In User mode, when Status.UX =1, User mode virtual addressing is 
extended to the 64-bit model shown in Figure 4.4, and a 1-Tbyte user 
address space is available, labelled xuseg. 

All valid User mode virtual addresses have bits 63:40 equal to 0; an 
attempt to reference an address with bits 63:40 not equal to 0 causes an 
Address Error exception. 

The extended addressing TLB refill exception vector is used for TLB 
misses. 

Supervisor Mode Operations 

Supervisor mode is designed for layered operating systems in which a 
true kernel runs in R4600/R4700 Kernel mode, and the rest of the 
operating system runs in Supervisor mode. 

The processor operates in Supervisor mode when the Status register 
contains the following bit-values: 

• KSU = 01 2 

• EXL = 0 

• ERL = 0 

In conjunction with these bits, the SX bit in the Status register selects 
between 32- or 64-bit Supervisor mode virtual addressing: 

• when SX = 0, 32-bit supervisor space virtual addressing is selected 

• when SX = 1, 64-bit supervisor space virtual addressing is selected 
Figure 4.5 shows Supervisor mode address mapping. Table 4.2, which 

follows the figure, lists the characteristics of the supervisor mode 
segments: descriptions of the address spaces follow. 



32-bit* 


Ox FFFF FFFF 

Ox EDOO 0000 

Address 

error 


Ox GOOO 0000 

0.5 GB 
Mapped 

sseg 

Ox MXX) 0000 

Address 

error 


Ox 8000 0000 

Address 

error 



2 GB 
Mapped 

suseg 

Ox 0000 0000 




Ok FFFF FFFF FFFF FFFF 
Ox FFFF FFFF EDOO 0000 

Ox FFFF FFFF CDOO 0000 

Ox 4000 0100 0000 0000 

Ox 4000 0000 0000 0000 

Ox 0000 0100 0000 0000 

Ox 0000 0000 0000 0000 


64-bit 


Address 

error 


0.5 GB 
Mapped 


Address 

error 


1 TB 
Mapped 


Address 

error 


1 TB 
Mapped 


csseg 


xsseg 


xsuseg 


Note: *In 32-bit virtual addressing, bit 31 is sign-extended through bits 

63:32. Failure results in an Address Error exception. 


Figure 4.5 Supervisor Mode Virtual Address Space 
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Address Bit 

Status Register 

Bit Values 

Segment 

Name 



Values 

KSU 

EXL 

ERL 

SX 

Address Range 

32-bit 

A(31) = 0 

01 2 

0 

0 

0 

suseg 

0x0000 0000 
through 

0x7FFF FFFF 

2 Gbytes 
(2 31 bytes) 

32-bit 

A(31:29) = 110 2 

01 2 

0 

0 

0 

sseg 

OxCOOO 0000 
through 

OxDFFF FFFF 

512 Mbytes 
(2 29 bytes) 

64-bit 

A(63:62) = 00 2 



0 

1 

xsuseg 

0x0000 0000 0000 0000 
through 

0x0000 OOFF FFFF FFFF 

1 Tbyte 
(2 40 bytes) 

64-bit 

A(63:62) = 01 2 

01 2 



1 

xsseg 

0x4000 0000 0000 0000 
through 

0x4000 OOFF FFFF FFFF 

1 Tbyte 
(2 40 bytes) 

64-bit 

A(63:62) = 1 1 2 


0 

0 

1 

csseg 

OxFFFF FFFF C000 0000 
through 

OxFFFF FFFF DFFF FFFF 

512 Mbytes 
(2 29 bytes) 


Table 4.2 32-bit and 64-bit Supervisor Mode Segments 


32-bit Supervisor Mode, User Space (suseg) 

In Supervisor mode, when Status. SX = 0 and the most-significant bit of 
the 32-bit virtual address is set to 0, the suseg virtual address space is 
selected; it covers the full 2 31 bytes (2Gbytes) of the current user address 
space. The virtual address is extended with the contents of the 8-bit ASID 
field to form a unique virtual address. 

This mapped space starts at virtual address 0x0000 0000 and runs 
through Qx7FFF FFFF. 

32-bit Supervisor Mode, Supervisor Space (sseg) 

In Supervisor mode, when Status.SX = 0 and the three most-significant 
bits of the 32 -bit virtual address are 1 10 2 , the sseg virtual address space 
is selected; it covers 2 29 -bytes (512Mbytes) of the current supervisor 
address space. The virtual address is extended with the contents of the 8- 
bit ASID field to form a unique virtual address. 

This mapped space begins at virtual address OxCOOO 0000 and runs 
through OxDFFF FFFF. 

64-bit Supervisor Mode, User Space ( xsuseg ) 

In Supervisor mode, when Status.SX = 1 and bits 63:62 of the virtual 
address are set to 00 2 , the xsuseg virtual address space is selected; it 
covers the full 2 40 bytes ( 1Tbyte) of the current user address space. The 
virtual address is extended with the contents of the 8-bit ASID field to form 
a unique virtual address. 

This mapped space starts at virtual address 0x0000 0000 0000 0000 
and runs through 0x0000 OOFF FFFF FFFF. 

64-bit Supervisor Mode, Current Supervisor Space (xsseg) 

In Supervisor mode, when Status.SX = 1 and bits 63:62 of the virtual 
address are set to 01 2 , the xsseg current supervisor virtual address space 
is selected. The virtual address is extended with the contents of the 8-bit 
ASID field to form a unique virtual address. 

This mapped space begins at virtual address 0x4000 0000 0000 0000 
and runs through 0x4000 OOFF FFFF FFFF. 
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64-bit Supervisor Mode, Separate Supervisor Space (cssep) 

In Supervisor mode, when Status.SX = 1 and bits 63:62 of the virtual 
address are set to 1 1 2 , the csseg separate supervisor virtual address space 
is selected. Addressing of the csseg is compatible with addressing sseg in 
32-bit mode. The virtual address is extended with the contents of the 8- 
bit ASID field to form a unique virtual address. 

This mapped space begins at virtual address QxFFFF FFFF C000 0000 
and runs through OxFFFF FFFF DFFF FFFF. 

Kernel Mode Operations 

The processor operates in Kernel mode when the Status register 
contains one of the following values: 

• KSU = 00 2 

• EXL = 1 

• ERL = 1 

In conjunction with these bits, the KX bit in the Status register selects 
between 32- or 64-bit Kernel mode addressing: 

• when KX = 0, 32-bit kernel space virtual addressing is selected 

• when KX = 1, 64-bit kernel space virtual addressing is selected 

The processor enters Kernel mode whenever an exception is detected 
and it remains in Kernel mode until an Exception Return (ERET) 
instruction is executed. The ERET instruction restores the processor to 
the mode existing prior to the exception. 

Kernel mode virtual address space is divided into regions differentiated 
by the high-order bits of the virtual address, as shown in Figure 4.6. 
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32-bit* 


64-bit 


Qx FFFF FFFF 

0.5 GB 
Mapped 

Ox FFFF FFFF FFFF FFFF 

kseg3 0x 0000 0000 

0.5 GB 
Mapped 

ckseg3 

Qx E000 0000 

0.5 GB 

cksseg 


0.5 GB 
Mapped 

Ox FFFF FFFF C000 0000 

Mapped 

Qx C000 0000 

ksseg 

0.5 GB 
Unmapped 
Uncacned 

cksegl 


0.5 GB 
Unmapped 
Uncached 





ksegl 0x pppp pppp 8000 0000 

0.5 GB ”1 
Unmapped 
Cached 

cksegO 

Ox A000 0000 


Address 

error 



0.5 GB 

Ox C000 OOFF 8000 0000 


Qx 8000 0000 

Unmapped 

Cached 

ksegO 

Ox C000 0000 0000 0000 

Mapped 

xkseg 



Ox 8000 0000 0000 0000 

Unmapped 

xkphys 



Qx 4000 0100 0000 0000 

Address 

error 



2 GB 

Mapped 

kuseg 

Ox 4000 0000 0000 0000 

1 TB 
Mapped 

xksseg 




Address 




Ox 0000 0100 0000 0000 

(V nnnn nnnn nnnn nnnn 

error 


Ox 0000 0000 


1 TB 
Mapped 

xkuseg 



ux uuuu uuuu uuuu uuuu 



Note: *In 32-bit virtual addressing, bit 31 is sign-extended through bits 63:32. 
results in an Address Error exception. 

Failure 


Figure 4.6 Kernel Mode Address Space 
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Table 4.3 lists the characteristics of the 32-bit kernel mode segments, 
and Table 4.4 lists the characteristics of the 64-bit kernel mode segments 


Address Bit 
Values 

Status Register Is 

One Of These Values 


Address Range 

Segment 

Size 



> 
CO 
1— * 

II 

o 

KSU = 00 2 

or 

EXL = 1 

or 

ERL =1 

0 

kuseg 

0x0000 0000 
through 

0x7FFF FFFF 

2 Gbytes 
(2 31 bytes) 

A(31:29) = IOO 2 

0 

ksegO 

0x8000 0000 
through 

0x9FFF FFFF 

512 

Mbytes 
(2 2 ^ bytes) 

A(3 1 :29) = 101 2 

0 

ksegl 

OxAOOO 0000 
through 

OxBFFF FFFF 

512 

Mbytes 
(2 2 ^ bytes) 

A(3 1 :29) = 110 2 

0 

ksseg 

OxCOOO 0000 
through 

OxDFFF FFFF 

512 

Mbytes 
(2 2 ^ bytes) 

A(31:29) = 111 2 

0 

kseg3 

OxEOOO 0000 
through 

OxFFFF FFFF 

512 

Mbytes 
(2 25 bytes) 


Table 4.3 32-bit Kernel Mode Segments 


32-bit Kernel Mode, User Space ( kuseg ) 

In Kernel mode, when Status.KX = 0, and the most-significant bit of the 
virtual address, A31, is cleared, the 32-bit kuseg virtual address space is 
selected; it covers the full 2 31 bytes (2 Gbytes) of the current user address 
space. The virtual address is extended with the contents of the 8-bit ASID 
field to form a unique virtual address. 

32-bit Kernel Mode, Kernel Space 0 (ksegO) 

In Kernel mode, when Status.KX = 0 and the most-significant three bits 
of the virtual address are 100 2 , 32-bit ksegO virtual address space is 
selected; it is the current 2 29 -byte (512-Mbyte) kernel physical space. 

References to ksegO are not mapped through the TLB; the physical 
address selected is defined by subtracting 0x8000 0000 from the virtual 
address (physical address = 0x0000 0000 0 I I VA[28:0]). 

The KO field of the Config register, described in this chapter, controls 
cacheability and coherency. 

32-bit Kernel Mode, Kernel Space 1 ( ksegl ) 

In Kernel mode, when Status.KX = 0 and the most-significant three bits 
of the 32 -bit virtual address are 10 1 2 , 32-bit ksegl virtual address space 
is selected; it is the current 2 29 -byte (512Mbyte) kernel physical space. 

References to ksegl are not mapped through the TLB; the physical 
address selected is defined by subtracting GxAOOO 0000 from the virtual 
address (physical address = 0x0000 0000 0 I I VA[28:01). 

Caches are disabled for accesses to these addresses, and physical 
memory (or memoiy-mapped I/O device registers) are accessed directly. 

32-bit Kernel Mode, Supervisor Space (ksseg) 

In Kernel mode, when Status.KX = 0 and the most-significant three bits 
of the 32 -bit virtual address are 110 2 , the ksseg virtual address space is 
selected; it is the current 2 29 -byte (512Mbyte) supervisor virtual space. 
The virtual address is extended with the contents of the 8-bit ASID field to 
form a unique virtual address. 
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32-bit Kernel Mode, Kernel Space 3 ( kseg3 ) 

In Kernel mode, when Status.KX = 0 and the most-significant three bits 
of the 32-bit virtual address are 1 1 12 , the kseg3 virtual address space is 
selected; it is the current 2 29 -byte (512Mbyte) kernel virtual space. The 
virtual address is extended with the contents of the 8-bit ASID field to form 
a unique virtual address. 


Address Bit 
Values 

Status Register Is 
One Of These Values 

Segment 

Name 

Address Range 

Segment 

Size 

KSU EXL ERL 

KX 

A(63:62) = 00 2 

KSU = 00 2 

or 

EXL = 1 

or 

ERL =1 

1 

xkuseg 

0x0000 0000 0000 0000 
through 

0x0000 OOFF FFFF FFFF 

1 Tbyte 
(2 40 bytes) 

A(63:62) = 01 2 

1 

xksseg 

0x4000 0000 0000 0000 
through 

0x4000 OOFF FFFF FFFF 

1 Tbyte 
(2 40 bytes) 

A(63:62) = 102 



0x8000 0000 0000 0000 
through 

OxBFFF FFFF FFFF FFFF 

8 2 36 -byte 
spaces 

A(63:62) = 1 1 2 

1 

xkseg 

OxCOOO 0000 0000 0000 
through 

OxCOOO OOFF 7FFF FFFF 

2 44 bytes 

A(63:62) = 1 1 2 
A(61:31) = -1 

1 

cksegO 

OxFFFF FFFF 8000 0000 
through 

OxFFFF FFFF 9FFF FFFF 

512 

Mbytes 
(2 29 bytes) 

A(63:62) = 1 1 2 
A(61:31) = -1 

1 

cksegl 

OxFFFF FFFF A000 0000 
through 

OxFFFF FFFF BFFF FFFF 

512 

Mbytes 
(2 25 bytes) 

A(63:62) = 1 1 2 
A(61:31) = -1 

1 

cksseg 

OxFFFF FFFF C000 0000 
through 

OxFFFF FFFF DFFF FFFF 

512 

Mbytes 
(2 29 bytes) 

A(63:62) = 1 1 2 
A(61:31) = -1 

1 

ckseg3 

OxFFFF FFFF E000 0000 
through 

OxFFFF FFFF FFFF FFFF 

512 

Mbytes 
(2 29 bytes) 


Table 4.4 64-bit Kernel Mode Segments 


64-bit Kernel Mode, User Space ( xkuseg 1 ) 

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual 
address are 00 2 , the xkuseg virtual address space is selected; it covers the 
current user address space. The virtual address is extended with the 
contents of the 8-bit ASID field to form a unique virtual address. 

As a special feature for the ECC handler, if the ERL bit of the Status 
register is set, the user address region becomes a 2 31 -byte unmapped, 
uncached space. This allows the ECC exception code to operate uncached 
using rO as a base register. 

64-bit Kernel Mode, Current Supervisor Space (xksseg) 

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual 
address are 01 2 , the xksseg virtual address space is selected; it is the 
current supervisor virtual space. The virtual address is extended with the 
contents of the 8-bit ASID field to form a unique virtual address. 
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64-bit Kernel Mode, Physical Spaces (xkphys) 

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual 
address are 10 2 , the xkphys virtual address space is selected: it is a set of 
eight 2 36 -byte kernel physical spaces. Accesses with address bits 58:36 
not equal to 0 cause an address error. 

References to this space are not mapped: the physical address selected 
is taken from bits 35:0 of the virtual address. Bits 61:59 of the virtual 
address specify the cacheability and coherency attributes, as shown in 
Table 4.5. 


Value 

(61:59) 

Cacheability and Coherency Attributes 

Starting Address 

0 

Cacheable, noncoherent, write-through, no 
write allocate 

0x8000 0000 0000 0000 

1 

Cacheable, noncoherent, write -through, write 
allocate 

0x8800 0000 0000 0000 

2 

Uncached 

0x9000 0000 0000 0000 


Cacheable, noncoherent 

0x9800 0000 0000 0000 

4-7 

Reserved 

OxAOOO 0000 0000 0000 


Table 4.5 Cacheability and Coherency Attributes 


64-bit Kernel Mode, Kernel Space (. xkseg ) 

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual 
address are 1 1 2 , the address space selected is one of the following: 

• kernel virtual space, xkseg, the current supervisor virtual space; the 
virtual address is extended with the contents of the 8-bit ASID field to 
form a unique virtual address 

• one of the four 32-bit kernel compatibility spaces, as described in the 
next section. 

64-bit Kernel Mode, Compatibility Spaces ( cksegltO , cksseg, ckseg3) 

In Kernel mode, when Status.KX = 1, bits 63:62 of the 64-bit virtual 
address are 1 1 2 , and bits 61 :3 1 of the virtual address equal “-1”, the lower 
two bytes of address, as shown in Figure 4.6, select one of the following 
512-Mbyte compatibility spaces. 

• cksegO. This 64-bit virtual address space is an unmapped region, 
compatible with the 32-bit address model ksegO. The KO field of the 
Conftg register, described in this chapter, controls cacheability and 
coherency. 

• cksegl. This 64-bit virtual address space is an unmapped and un- 
cached region, compatible with the 32-bit address model ksegl. 

• cksseg. This 64-bit virtual address space is the current supervisor 
virtual space, compatible with the 32-bit address model ksseg. 

• ckseg3. This 64-bit virtual address space is kernel virtual space, 
compatible with the 32-bit address model kseg3. 

System Control Coprocessor 

The System Control Coprocessor (CPO) is implemented as an integral 
part of the CPU, and supports memory management, address translation, 
exception handling, and other privileged operations. CPO contains the 
registers shown in Figure 4.7 plus a 48-entry TLB. The sections that follow 
describe how the processor uses each of the memory management-related 
registers. 

Each CPO register has a unique number that identifies it: this number 
is referred to as the register number. For instance, the Page Mask register 
is register number 5. 
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EntryHi 

10* 


EntryLoO 

2 * 

EntryLol 

3* 


(“Safe” entries) 

(See Random Register, 
contents of TLB Wired) 

127 0 


LLAddr 

17* 


TagLo 

28* 



TagHi 

29* 



1 k 


■ 



PRId 

15* 

- - - ! 

H5T 




Config 

16* 

| 6 ^| 


□ 


Used with memory 
management system. 


Used with exception 
processing. See 
Chapter 5 for details. 


Note: *Register number 


Figure 4.7 CPO Registers and the TLB 


Format of a TLB Entry 

Figure 4.8 shows the TLB entry formats for both 32- and 64-bit virtual 
addressing. Each field of an entry has a corresponding field in the EntryHi 
EntryLoO , EntryLol , or PageMask registers, as shown in Figure 4.9 and 
Figure 4. 10; for example the Mask field of the TLB entry is also held in the 
PageMask register. 
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256-bit TLB 
entry in 64-bit 
virtual addressing 


64-bit Virtual Addressing 


217 216 


205 204 


MASK 


191 190 189 


168 167 


141 140139 136 135 

g| 0 


70 69 67 66 65 64 



Figure 4.8 Format of a TLB Entry 

The format of the EntryHi, EntryLoO, EntryLol, and PageMask registers 
are nearly the same as the TLB entry. The one exception is the Global field 
(G bit), which is used in the TLB, but is reserved in the EntryHi register. 
Figure 4.9 and Figure 4. 10 describe the TLB entry fields that are shown in 
Figure 4.8. 



PageMask Register 


Mask.... Page comparison mask. 

0 Reserved. Must be written as zeroes, and returns zeroes when read. 


EntryHi Register 


63 62 61 


64-bit 

VA 


VPN2.... Virtual page number divided by two (maps to two pages). 

ASID..... Address space ID field. An 8-bit field that lets multiple processes share the TLB; each 
process has a distinct mapping of otherwise identical virtual page numbers. 

R Region. (00 -> user, 01 -» supervisor, 11 -» kernel) used to match vAddr 63...62 

Fill Reserved. Returns zero when read, ignored on writes. 

0 Reserved. Must be written as zeroes, and returns zeroes when read. 


Figure 4.9 Fields of the PageMask and EntryHi Registers 
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EntryLoO and EntryLol Registers 


63 30 29 6 5 3 2 1 0 


64-bit 

VA 

0 

PFN 

c 


[v 

\D 


34 

24 

3 

1 

1 

1 

6 

3 30 

29 6 

5 3 

2 

1 

0 

64-bit 

VA 

0 

PFN 

c 

ll 

3 

iD 


34 

24 

3 

1 

1 

i 


PFN Page frame number; the upper bits of the physical address. 

C Specifies the TLB page coherency attribute; see Table 4.6. 

D Dirty. If this bit is set, the page is marked as dirty and, therefore, writable. This bit is 

actually a write-protect bit that software can use to prevent alteration of data. 

V. Valid. If this bit is set, it indicates that the TLB entry is valid; otherwise, a TLBL or TLBS 

miss occurs. 

G Global. If this bit is set in both LoO and Lol, then the processor ignores the ASID during 

TLB lookup. 

0 Reserved. Must be written as zeroes, and returns zeroes when read. 


Figure 4.10 Fields of the EntryLoO and EntryLol Registers 


The TLB page coherency attribute [Q bits specify whether references to 
the page should be cached; if cached, the algorithm selects between several 
coherency attributes. Table 4.6 shows the coherency attributes selected 
by the C bits. 


C(5:3) Value 

Page Coherency Attribute 

0 

Cacheable, noncoherent, write-through, no write allocate 

1 

Cacheable, noncoherent, write- through, write allocate 

2 

Uncached 

3 

Cacheable, noncoherent, write-back 

4 - 7 

Reserved 


Table 4.6 TLB Page Coherency (C) Bit Values 


CPO Registers 

The following sections describe the CPO registers (shown in Figure 4.7 
on page 13) that are assigned specifically as a software interface with 
memory management (each register is followed by its register number in 
parentheses). 

• Index register (CPO register number 0) 

• Random register ( 1 ) 

• EntryLoO (2) and EntryLol (3) registers 

• PageMask register (5) 

• Wired register (6) 

• Entry Hi register ( 1 0) 

• PRId register ( 1 5) 

• Conjig register ( 1 6) 

• LLAddr register { 1 7) 

• TagLo (28) and TagHi (29) registers 
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Index Register (0) 

The Index register is a 32-bit, read/write register containing six bits to 
index an entry in the TLB. The high-order bit of the register shows the 
success or failure of a TLB Probe (TLBP) instruction. 

The Index register also specifies the TLB entry affected by TLB Read 
(TLBR) or TLB Write Index (TLBWI) instructions. 

Figure 4. 11 shows the format of the Index register; Table 4.7, which 
follows the figure, describes the Index register fields. 


31 

30 

Index Register 

6 

5 0 

■ 

E 

0 

■291 



25 


6 


Figure 4. 1 1 Index Register 


Field 

Description 

P 

Probe failure. Set to 1 when the previous TLBProbe 
(TLBP) instruction was unsuccessful. 

Index 

Index to the TLB entry affected by the TLBRead and 
TLBWrite instructions 

0 

Reserved. Must be written as zeroes, and returns 
zeroes when read. 


Table 4.7 Index Register Field Descriptions 


Random Register (1) 

The Random register is a read-only register of which six bits index an 
entry in the TLB. This register decrements as each instruction executes, 
and its values range between an upper and a lower bound, as follows: 

• A lower bound is set by the number of TLB entries reserved for exclu- 
sive use by the operating system (the contents of the Wired register). 

• An upper bound is set by the total number of TLB entries. Thus the 
upper bound is 47 (The TLB entries are number from 0 to 47). 

The R4600/R4700 implements this register differently from the 
R4000: The R4000 counts both valid and invalid instructions, while the 
R4600/R4700 counts only valid instructions. 

The Random register specifies the entry in the TLB that is affected by the 
TLB Write Random instruction. The register does not need to be read for 
this purpose; however, the register is readable to verify proper operation of 
the processor. 

To simplify testing, the Random register is set to the value of the upper 
bound upon system reset. This register is also set to the upper bound 
when the Wired register is written. 

Figure 4. 12 shows the format of the Random register; Table 4.8 on 
page 17 describes the Random register fields. 



Random Register 



31 


6 

5 0 

■ 

0 

■n 


26 


6 


Figure 4.12 Random Register 
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Field 

Description 

Random 

TLB random index 

0 

Reserved. Must be written as zeroes, and returns zeroes when read. 


Table 4.8 Random Register Field Descriptions 


Entry LoO (2), and Entry Lol (3) Registers 

The EntryLo register consists of two registers that have Identical 
formats: 

• EntryLoO is used for even virtual pages. 

• EntryLol is used for odd virtual pages. 

The EntryLoO and EntryLol registers are read/write registers. They 
hold the physical page frame number (PFN) of the TLB entry for even and 
odd pages, respectively, when performing TLB read and write operations. 
Figure 4.10 on page 15 shows the format of these registers. 

PageMask Register (5) 

The PageMask register is a read/write register used for reading from or 
writing to the TLB; it holds a comparison mask that sets the variable page 
size for each TLB entry, as shown in Table 4.9. 

TLB read and write operations use this register as either a source or a 
destination: when virtual addresses are presented for translation into 
physical address, the corresponding bits in the TLB identify which virtual 
address bits among bits 24: 13 are used in the comparison. 

When the Mask field is not one of the values shown in Table 4.9, the 
operation of the TLB is undefined. 



Bit 

Page Size 

B 

IB 




i 

i 

□ 

l 

l 

l 

l 


□ 

5 




9 

8 

H 

6 

5 

4 

3 

4 Kbytes 

H 

n 

H 

B 

B 

B 

H 

n 

mm 

Kg 

B 

m 

16 Kbytes 

m 

B 

H 

B 

b 

n 

El 

B 

B 

B 

B 

B 

64 Kbytes 

m 

u 

0 

0 

El 

B 

SI 

B 

B 

B 

B 

B 

256 Kbytes 


SI 

B 

SI 

El 

El 

n 

B 

B 

B 

B 

B 

1 Mbyte 

11 

b 

B 

B 

n 

1 

n 

B 

B 

B 

B 

B 

4 Mbytes 


b 

n 

n 

n 

n 

D 

B 

B 

B 

B 

B 

16 Mbytes 

D 

n 

D 

si 

n 

n 

n 

B 

B 

B 

B 

B 


Table 4.9 Mask Field Values for Page Sizes 
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Wired Register (6) 

The Wired register is a read/write register that specifies the boundary 
between the wired and random entries of the TLB, as shown in Figure 4. 13. 
Wired entries are nonreplaceable entries, which cannot be overwritten by 
a TLB write random operation. Random entries can be overwritten. 


TLB 


Range of Wired entries 


Figure 4. 13 Wired Register Boundary 

The Wired register is set to 0 upon system reset. Writing this register 
also sets the Random register to the value of its upper bound (see Random 
register, above). Figure 4.14 shows the format of the Wired register; 
Table 4.10, which follows the figure, describes the register fields. 
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Figure 4.14 Wired Register 


Field 

Description 

Wired 

TLB Wired boundary (the number of wired TLB entries) 

0 

Reserved. Must be written as zeroes, and returns zeroes 
when read. 


Table 4. 10 Wired Register Field Descriptions 


EntryHi Register (CPO Register 10) 

The EntryHi register holds the high-order bits of a TLB entry for TLB 
read and write operations. 

The EntryHi register is accessed by the TLB Probe, TLB Write Random, 
TLB Write Indexed, and TLB Read Indexed instructions. 

Figure 4.9 shows the format of this register. 

When either a TLB refill, TLB invalid, or TLB modified exception occurs, 
the EntryHi register is loaded with the virtual page number (VPN2) and the 
ASID of the virtual address that did not have a matching TLB entry. (See 
Chapter 5 for more information about these exceptions.) 
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Processor Revision Identifier (PRId) Register (IS) 

The 32 -bit, read-only Processor Revision Identifier ( PRId) register 
contains information identifying the implementation and revision level of 
the CPU and CPO. Figure 4.15 shows the format of the PRId register; 
Table 4.11 describes the PRId register fields. 




PRId Register 



31 


1615 

87 0 


0 

Imp 

Rev | 


16 


8 


8 


Figure 4.15 Processor Revision Identifier Register Format 


Field 

Description 

Imp 

Implementation number R4600: Imp = 0x20 

R4700: Imp = 0x21 

Rev 

Revision number 

0 

Reserved. Must be written as zeroes, and returns zeroes when read. 


Table 4. 1 1 PRId Register Fields 


The low-order byte (bits 7:0) of the PRId register is interpreted as a 
revision number, and the high-order byte (bits 15:8) is interpreted as an 
implementation number. The implementation number of the R4600/ 
R4700 processor is 0x20. The content of the high-order halfword (bits 
31:16) of the register are reserved. 

The revision number is stored as a value in the form y.x, where y is a 
major revision number in bits 7:4 and x is a minor revision number in bits 
3:0. 

The revision number can distinguish some chip revisions, however there 
is no guarantee that changes to the chip will necessarily be reflected in the 
PRId register, or that changes to the revision number necessarily reflect 
real chip changes. For this reason, these values are not listed and software 
should not rely on the revision number in the PRId register to characterize 
the chip. Certain attributes, such as cache size, are independent of 
implementation number. 

Config Register (16) 

The Config register specifies various configuration options selected on 
R4600/R4700 processors; Table 4. 12 lists these options. 

Some configuration options, as defined by Config bits 31:3, are set by 
the hardware during reset and are included in the Config register as read- 
only status bits for the software to access. The KO field is the only read/ 
write field (as indicated by Config register bits 2:0) and controlled by 
software: on reset these fields are undefined. 

Figure 4.16 shows the format of the Cortfig register; Table 4.12, which 
follows the figure, describes the Config register fields. 
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Figure 4. 16 Config Register Format 
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Field 

Description 

EC 

System clock ratio: 

0 -> processor clock frequency divided by 2 

1 — » processor clock frequency divided by 3 

2 — > processor clock frequency divided by 4 

3 -> processor clock frequency divided by 5 

4 -» processor clock frequency divided by 6 

5 -> processor clock frequency divided by 7 

6 -» processor clock frequency divided by 8 

7 Reserved 

EP 

Writeback data rate: 

0 -» DDDD Doubleword every cycle 

1 — > DDxDDx 2 Doublewords every 3 cycles 

2 DDxxDDxx 2 Doublewords every 4 cycles 

3 — » DxDxDxDx 2 Doublewords every 4 cycles 

4 — > DDxxxDDxxx 2 Doublewords every 5 cycles 

5 -> DDxxxxDDxxxx 2 Doublewords every 6 cycles 

6 — » DxxDxxDxxDxx 2 Doublewords every 6 cycles 

7 DDxxxxxDDxxxxx 2 Doublewords every 7 cycles 

8 — » DxxxDxxxDxxxDxxx 2 Doublewords every 8 cycles 

9-15 Reserved 

BE 

BigEndianMem 

0 -» Little endian 

1 -> Big endian 

IC 

Primary I-cache Size (I-cache size = 2 12+1C bytes). In the R4600/R4700 
processor, this is set to 16 Kbytes (IC = 010) 

DC 

Primary D-cache Size (D-cache size = 2 12+ ^ c bytes). In the R4600/R4700 
processor, this is set to 16 Kbytes (DC = 010) 

IB 

Primary I-cache line size 

1 32 bytes (8 Words) 

DB 

Primary D-cache line size 
l-> 32 bytes (8 Words) 

KO 

ksegO coherency algorithm (see EntryLoO and EntryLol registers) 

Others 

Reserved. Returns indicated values when read. 


Table 4.12 Config Register Fields 


Load Linked Address (LLAddr) Register (17) 

The read/write Load Linked Address ( LLAddr ) register contains the 
physical address read by the most recent Load Linked instruction. 

This register is for diagnostic purposes only, and serves no function 
during normal operation. 

Figure 4. 17 shows the format of the LLAddr register; PAddr represents 
bits of the physical address, PA(35:4). 
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Figure 4.17 LLAddr Register Format 


Cache Tag Registers [TagLo (28) and TagHi (29)] 

The TagLo and TagHi registers are 32-bit read/write registers that hold 
the primary cache tag and parity during cache initialization, cache 
diagnostics, or cache error processing. The Tag registers are written by the 
CACHE and MTCO instructions. 

The P field of these registers is ignored on Index Store Tag operations. 
Parity is computed by the store operation. 

The Windows NT Operating System uses the TagLo cpO register to save/ 
restore gp registers in the TLB refill exception handler. Thus, all 32 bits 
must be present, even though they have no use for the primary purpose of 
TagLo. 

Figure 4.18 shows the format of these registers for primary cache 
operations. Table 4.13 lists the field definitions of the TagLo and TagHi 
registers. 
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Figure 4.18 TagLo and TagHi Register (P -cache) Formats 


Field 

Description 


Specifies the physical address bits 35:12 


Specifies the primary cache state 

P 

Specifies the primary tag even parity bit 

F 

The FIFO bit used to implement FIFO refill of the cache 

RWNT 

Read /Write bits required for Windows NT 

0 

Reserved. Must be written as zeroes; returns zeroes when read 


Table 4.13 Cache Tag Register Fields 
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Virtual-to-Physical Address Translation Process 

During virtual-to-physical address translation, the CPU compares the 
8-bit ASID (if the Global bit, G, is not set) of the virtual address to the ASID 
of the TLB entry to see if there is a match. 

The following comparison is also made: 

• For the 64-bit virtual addresses, the highest 15-to-27 bits (depending 
upon the page size) of the virtual address are compared to the con- 
tents of the TLB virtual page number. 

If a TLB entry matches, the physical address and access control bits (C, 
D, and V) are retrieved from the matching TLB entry. While the Vbit of the 
entry must be set for a valid translation to take place, it is not involved in 
the determination of a matching TLB entry. 

Figure 4.19 illustrates the TLB address translation process. 



Figure 4.19 TLB Address Translation 
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TLB Misses 

If there is no TLB entry that matches the virtual address, a TLB miss 
exception occurs. If the access control bits (D and V) indicate that the 
access is not valid, a TLB modification or TLB invalid exception occurs. If 
the C bits equal 010 2 , the physical address that is retrieved accesses main 
memory, bypassing the cache. 

TLB Instructions 

Table 4. 14 lists the instructions that the CPU provides for working with 
the TLB. See Appendix A for a detailed description of these instructions. 


Op Code 

Description of Instruction 

TLBP 

Translation Lookaside Buffer Probe 

TLBR 

Translation Lookaside Buffer Read 

TLBWI 

Translation Lookaside Buffer Write Index 

TLBWR 

Translation Lookaside Buffer Write Random 


Table 4. 14 TLB Instructions 
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This chapter describes the CPU exception processing, including an 
explanation of exception processing, followed by the format and use of 
each CPU exception register. 

The chapter concludes with a description of each exception’s cause, 
together with the manner in which the CPU processes and services these 
exceptions. For information about Floating-Point Unit exceptions, see 
Chapter 7. 

How Exception Processing Works 

The processor receives exceptions from a number of sources, including 
translation lookaside buffer (TLB) misses, arithmetic overflows, I/O 
interrupts, and system calls. When the CPU detects one of these 
exceptions, the normal sequence of instruction execution is suspended 
and the processor enters Kernel mode (see Chapter 4 for a description of 
system operating modes). 

The processor then disables interrupts and forces execution of a 
software exception processor (called a handler) located at a fixed address. 
The handler may save the context of the processor, including the contents 
of the program counter, the current operating mode (User or Supervisor), 
and the status of the interrupts (enabled or disabled). This context would 
be saved so it can be restored when the exception has been serviced. 

When an exception occurs, the CPU loads the Exception Program 
Counter ( EPQ register with a location where execution can restart after the 
exception has been serviced. The restart location in the EPC register is the 
address of the instruction that caused the exception or, if the instruction 
was executing in a branch delay slot, the address of the branch instruction 
immediately preceding the delay slot. 

The registers described later in the chapter assist in this exception 
processing by retaining address, cause and status information. 

For a description of the exception handling process, see the description 
of the individual exception contained in this chapter, or the flowcharts at 
the end of this chapter. 

Exception Processing Registers 

This section describes the CPO registers that are used in exception 
processing. Table 5. 1 on page 5-2 lists these registers, along with their 
number — each register has a unique identification number that is referred 
to as its register number. For instance, the ECC register is register number 
26. The remaining CPO registers are used in memory management as 
described in Chapter 4. 

Software examines the CPO registers during exception processing to 
determine the cause of the exception and the state of the CPU at the time 
the exception occurred. The registers in Table 5. 1 are used in exception 
processing, and are described in the sections that follow. 
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Register Name 

Reg. No. 

Context 

4 

BadVAddr (Bad Virtual Address) 

8 

Count 

9 

Compare register 

11 

Status 

12 

Cause 

13 

EPC (Exception Program Counter) 

14 

XContext 

20 

ECC 

26 

CacheErr (Cache Error and Status) 

27 

ErrorEPC (Error Exception Program Counter) 

30 


Table 5.1 CPO Exception Processing Registers 


Context Register (4) 

The Context register is a read/write register containing the pointer to an 
entry in the page table entry (PTE) array; this array is an operating system 
data structure that stores virtual-to-physical address translations. When 
there is a TLB miss, the CPU loads the TLB with the missing translation 
from the PTE array. Normally, the operating system uses the Context 
register to address the current page map which resides in the kernel- 
mapped segment, kseg3. The Context register duplicates some of the 
information provided in the BadVAddr register, but the information is 
arranged in a form that is more useful for a software TLB exception 
handler. Figure 5. 1 shows the format of the Context register; Table 5.2, 
which follows the figure, describes the Context register fields. 
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Figure 5. 1 Context Register Format 


Field 

Description 

BadVPN2 

This field is written by hardware on a miss. It contains 
the virtual page number (VPN) of the most recent virtual 
address that did not have a valid translation. 

PTEBase 

This field is a read/write field for use by the operating 
system. It is normally written with a value that allows 
the operating system to use the Context register as a 
pointer into the current PTE array in memory. 


Table 5.2 Context Register Fields 


The 19-bit BadVPN2 field contains bits 31:13 of the virtual address that 
caused the TLB miss; bit 12 is excluded because a single TLB entry maps 
to an even-odd page pair. For a 4-Kbyte page size, this format can directly 
address the pair-table of 8-byte PTEs. For other page and PTE sizes, 
shifting and masking this value produces the appropriate address. 
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Bad Virtual Address Register (BadVAddr) (8) 

The Bad Virtual Address register ( BadVAddr ) is a read-only register that 
displays the most recent virtual address that caused one of the following 
exceptions: Address Error (e.g., unaligned access), TLB Invalid, TLB 
Modified, TLB Refill, Virtual Coherency Data Access, or Virtual Coherency 
Instruction Fetch. 

The processor does not write to the BadVAddr register when the EXL bit 
in the Status register is set to a 1. 

Figure 5.2 shows the format of the BadVAddr register. 
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Figure 5.2 BadVAddr Register Format 

Note: The BadVAddr register does not save any information for bus 
errors, since bus errors are not addressing errors. 


Count Register (9) 

The Count register acts as a timer, incrementing at a constant rate — half 
the maximum instruction issue rate — whether or not an instruction is 
executed, retired, or any forward progress is made through the pipeline. 

This register can be read or written. It can be written for diagnostic 
purposes or system initialization: for example, to synchronize processors. 

Figure 5.3 shows the format of the Count register. 
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Figure 5.3 Count Register Format' 


Compare Register (1 1) 

The Compare register acts as a timer (see also the Count register); it 
maintains a stable value that does not change on its own. 

When the value of the Count register equals the value of the Compare 
register, interrupt bit IP(7) in the Cause register is set. This causes an 
interrupt as soon as the interrupt is enabled. 

Writing a value to the Compare register, as a side effect, clears the timer 
interrupt. 

For diagnostic purposes, the Compare register is a read/write register. 
In normal use however, the Compare register is write-only. Figure 5.4 
shows the format of the Compare register. 
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Figure 5.4 Compare Register Format 
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Status Register (12) 

The Status register (SR) is a read/write register that contains the 
operating mode, interrupt enabling, and the diagnostic states of the 
processor. The following list describes the more important Status register 
fields; Figure 5.5 show the format of the entire register, including 
descriptions of the fields. Some of the important fields include: 

• The 8-bit Interrupt Mask (IM) field controls the enabling of eight inter- 
rupt conditions. Interrupts must be enabled before they can cause the 
exception, and the corresponding bits are set in both the Interrupt 
Mask field of the Status register and the Interrupt Pending field of the 
Cause register. For more information, refer to the Interrupt Pending 
(IP) field of the Cause register. IM[ 1:0] are the masks for the two soft- 
ware interrupts while IM[7:2] correspond to Int[5:0]. 

• The 4-bit Coprocessor Usability ( CU) field controls the usability of 4 
possible coprocessors. Regardless of the CUO bit setting, CPO is al- 
ways usable in Kernel mode. For all other cases, an instruction for or 
access to an unusable coprocessor causes an exception. 

• The 9-bit Diagnostic Status (DS) field (Status[24:16]) is used for self- 
testing, and checks the cache and virtual memory system. 

• The Reverse-Endian (RE) bit, bit 25, reverses the endianness of the 
machine. The processor can be configured as either little-endian or 
big-endian at system reset. This selection is always used in Kernel 
and Supervisor modes, and also in User mode when the RE bit is 0. 
Setting the RE bit to 1 inverts the User mode endianness. 

Status Register Format 

Figure 5.5 shows the format of the Status register. Table 5.3, which 
follows the figure, describes the Status register fields. 
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Field 

Description 

cu 

Controls the usability of each of the four coprocessor unit numbers. CPO Is always usable 
when In Kernel mode, regardless of the setting of the CUq bit. 

1 -4 usable 0 -4 unusable 

FR 

Enables additional floating-point registers 

0-^16 registers 1 -4 32 registers 

RE 

Reverse-Endian bit, valid in User mode. 

BEV 

Controls the location of TLB refill and general exception vectors. 

0 -4 normal l-» bootstrap 

SR 

1-4 Indicates a soft reset or NMI has occurred. 

CH 

Hit (tag match and valid state) or miss indication for last CACHE Hit Invalidate, Hit Write 
Back Invalidate, Hit Write Back, or Hit Set Virtual for a primary cache. 

0 -4 miss 1 -4 hit 

CE 

Contents of the ECC register set or modify the check bits of the caches when CE = 1 ; see 
description of the ECC register. 

DE 

Specifies that cache parity errors cannot cause exceptions. 

0 -4 parity remains enabled 1 -4 disables parity 

0 

Reserved. Must be written as zeroes, and returns zeroes when read. 

IM 

Interrupt Mask: controls the enabling of each of the external, internal, and software inter- 
rupts. An interrupt is taken if interrupts are enabled, and the corresponding bits are set in 
both the Interrupt Mask field of the Status register and the Interrupt Pending field of the Cause 
register. IM[7:2] correspond to interrupts Int[5:0] and IM[1:0] to the software interrupts. 

0 -4 disabled 1-4 enabled 

KX 

KX controls whether the TLB Refill Vector or the XTLB Refill Vector address is used for TLB 
misses on kernel addresses 

0 -4 TLB Refill Vector 1 -4 XTLB Refill Vector 

SX 

Enables 64-bit virtual addressing and operations in Supervisor mode. The extended-address- 
ing TLB refill exception is used for TLB misses on supervisor addresses. 

0 -4 32-bit 1 -4 64-bit 

UX 

Enables 64-bit virtual addressing and operations in User mode. The extended-addressing TLB 
refill exception is used for TLB misses on user addresses. 

0^ 32-bit 1 -4 64-bit 

KSU 

Mode bits 

10 2 -4 User 01 2 -4 Supervisor OC^ -4 Kernel 

ERL 

Error Level 

0 -4 normal 1 -4 error 

EXL 

Exception Level 

0 -4 normal 1 -4 exception 

Note: When going from 0 to 1 , IE should be disabled (0) first. This would be done when pre- 
paring to return from the exception handler, such as before executing the ERET instruction. 

IE 

Interrupt Enable 

0 -4 disable interrupts 1 -4 enables interrupts 


Table 5.3 Status Register Fields 
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Status Register Modes and Access States 

Fields of the Status register set the modes and access states described 
in the sections that follow. 

Interrupt Enable: Interrupts are enabled when all of the following 
conditions are true: 

• IE =1 

• EXL = 0 

• ERL = 0 

If these conditions are met, the settings of the IM bits identify the 
interrupt. 

Note: Setting the IE bit may be delayed by up to 3 cycles. If performing 
nested interrupts, re-enable the IE bit first. 

Operating Modes: The following CPU Status register bit settings are 
required for User, Kernel, and Supervisor modes (see Chapter 4 for more 
information about operating modes). 

• The processor is in User mode when KSU = 10 2 , EXL = 0, and ERL = 0. 

• The processor is in Supervisor mode when KSU = OI 2 , EXL = 0, and 
ERL = 0. 

• The processor is in Kernel mode when KSU = 00 2 , or EXL = 1, or ERL 
= 1. 

32- and 64-bit Virtual Addressing: The following CPU Status register 
bit settings select 32- or 64-bit virtual addressing for User and Supervisor 
operating modes. Enabling 64-bit virtual addressing permits the execution 
of 64-bit opcodes and translation of 64-bit virtual addresses. 64-bit virtual 
addressing for User and Supervisor modes can be set independently but is 
always used for Kernel mode. 

• The KX field controls whether the TLB Refill Vector or the XTLB Refill 
Vector address is used for TLB misses on Kernel addresses. 64-bit op- 
codes are always valid in Kernel mode. 

• 64-bit addressing and operations are enabled for Supervisor mode 
when SX = 1 . 

• 64-bit addressing and operations are enabled for User mode when UX 
= 1. 

Kernel Address Space Accesses: Access to the kernel address space is 
allowed when the processor is in Kernel mode. 

Supervisor Address Space Accesses: Access to the supervisor address 
space is allowed when the processor is in Kernel or Supervisor mode, as 
described above in the paragraph titled Operating Modes. 

User Address Space Accesses: Access to the user address space is 
allowed in any of the three operating modes. 

Status Register Reset 

The contents of the Status register are undefined at reset, except for the 
following bits — ERL and BEV = 1 . 

The SR bit distinguishes between Reset and Soft Reset (Nonmaskable 
Interrupt [NMI]). 
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Cause Register (13) 

The 32-bit read/write Cause register describes the cause of the most 
recent exception. 

Figure 5.6 shows the fields of this register; Table 5.4, which follows the 
figure, describes the Cause register fields. A 5-bit exception code ( ExcCode ) 
indicates the cause of the most recent exception, as listed in Table 5.5 on 
page 5-8. 

All bits in the Cause register, with the exception of the IP(1:0) bits, are 
read-only; IP(1:0) are used for software interrupts. 


31 30 29 28 27 


Cause Register 

16 15 


8 7 6 


2 1 0 





* 


IP 

• 

Exc 

Code 


1 

1 

2 

12 

8 

1 

5 

2 


Figure 5.6 Cause Register Format 


Field 

Description 

BD 

Indicates whether the last exception taken occurred in a branch delay slot. 

1 delay slot 

0 — > normal 

CE 

Coprocessor unit number referenced when a Coprocessor Unusable excep- 
tion is taken. 

IP 

Indicates an interrupt is pending. 

1 -> interrupt pending 

0 -» no interrupt 

ExcCode 

Exception code field (see Table 5.5 on page 5-8) 

0 

Reserved. Must be written as zeroes, and returns zeroes when read. 


Table 5.4 Cause Register Fields 


5-7 























CPU Exception Processing 


Chapter 5 


Exception 

Code 

Value 

Mnemonic 

Description 

0 

Int 

Interrupt 

1 

Mod 

TLB modification exception 

2 

TLBL 

TLB exception (load or instruction fetch) 

3 

TLBS 

TLB exception (store) 

4 

AdEL 

Address error exception (load or instruction fetch) 

5 


Address error exception (store) 

6 

IBE 

Bus error exception (instruction fetch) 

7 

DBE 

Bus error exception (data reference: load or store) 

8 

Sys 

Syscall exception 

9 

Bp 

Breakpoint exception 

10 

RI 

Reserved instruction exception 

11 

CpU 

Coprocessor Unusable exception 

12 

Ov 

Arithmetic Overflow exception 

13 

Tr 

Trap exception 

14 

— 

Reserved 

15 

FPE 

Floating-Point exception 

16-31 

— 

Reserved 


Table 5.5 Cause Register ExcCode Field 


Exception Program Counter (EPC) Register (14) 

The Exception Program Counter (EPQ is a read/write register that 
contains the address at which processing resumes after an exception has 
been serviced. 

For synchronous exceptions, the EPC register contains either: 

• the virtual address of the instruction that was the direct cause of the 
exception, or 

• the virtual address of the immediately preceding branch or jump in- 
struction (when the instruction is in a branch delay slot, and the 
Branch Delay bit in the Cause register is set). 

The processor does not write to the EPC register when the EXLbit in the 
Status register is set to a 1 . 

Figure 5.7 shows the format of the EPC register. 
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Figure 5.7 EPC Register Format 
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XContext Register (20) 

The read/write XContext register contains a pointer to an entry in the 
page table entry (PTE) array, an operating system data structure that 
stores virtual-to-physical address translations. When there is a TLB miss, 
the operating system software loads the TLB with the missing translation 
from the PTE array. The XContext register duplicates some of the 
information provided in the BadVAddr register, and puts it in a form useful 
for a software TLB exception handler. 

The XContext register is for use with the XTLB refill handler, which loads 
TLB entries for references to a 64-bit address space, and is included solely 
for operating system use. The operating system sets the PTE base field in 
the register, as needed. Normally, the operating system uses the XContext 
register to address the current page map, which resides in the kernel- 
mapped segment kseg3. 

Figure 5.8 shows the format of the XContext register; Table 5.6, which 
follows the figure, describes the XContext register fields. 
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Figure 5.8 XContext Register Format 


The 27-bit BadVPN2 field has bits 39:13 of the virtual address that 
caused the TLB miss; bit 12 is excluded because a single TLB entry maps 
to an even-odd page pair. For a 4-Kbyte page size, this format may be used 
directly to address the pair-table of 8-byte PTEs. For other page and PTE 
sizes, shifting and masking this value produces the appropriate address. 


Field 

Description 

BadVPN2 

The Bad Virtual Page Number/2 field Is written by hardware on a 
miss. It contains the VPN of the most recent invalidly translated vir- 
tual address. 

R 

The Region field contains bits 63:62 of the virtual address. 

00 2 = user 

01 2 = supervisor 

1 1 2 = kernel. 

PTEBase 

The Page Table Entry Base read/write field is normally written with 
a value that allows the operating system to use the Context register 
as a pointer into the current PTE array in memory. 


Table 5.6 XContext Register Fields 


Error Checking and Correcting (ECC) Register (26) 

The 8-bit Error Checking and Correcting ( ECO register reads or writes 
primary-cache data parity bits for cache initialization, cache diagnostics, 
or cache error processing. (Tag parity is loaded from and stored to the 
TagLo register.) 

The ECC register is loaded by the Index Load Tag CACHE operation. 
Content of the ECC register is: 

• written into the primary data cache on store instructions (instead of 
the computed parity) when the CE bit of the Status register is set 

• substituted for the computed instruction parity for the CACHE oper- 
ation Fill 

To force a cache parity value use the Status CE bit and the ECC register. 
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Figure 5.9 shows the format of the ECC register; Table 5.7, which follows 
the figure, describes the register fields. 


31 

ECC Register 

8 

7 

0 

■ 

0 

mmm 


24 

8 

— 


Figure 4.9 ECC Register Format' 


Field 

Description 

ECC 

An 8-bit field specifying the parity bits read from or 
written to a primary cache. 

0 

Reserved. Must be written as zeroes, and returns 
zeroes when read. 


Table 5.7 ECC Register Fields 


Cache Error (CacheErr) Register (27) 

The 32-bit read-only CacheErr register processes parity errors in the 
primary cache. Parity errors cannot be corrected. 

The CacheErr register holds cache index and status bits that indicate 
the source and nature of the error; it is loaded when a Cache Error 
exception is asserted. When a read response returns with bad parity this 
exception is also asserted. 

Figure 5. 10 shows the format of the CacheErr register; , which follows 
the figure, describes the CacheErr register fields. 
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Field 

Description 

ER 

Type of reference 

0 — » instruction 

1 -» data 

EC 

Cache level of the error 

0 — > primary 

1 — » reserved 

ED 

Indicates if a data field error occurred 

0 -> no error 

1 — > error 

ET 

Indicates if a tag field error occurred 

0 — > no error 

1 -terror 

ES 

Indicates the error occurred accessing processor-managed resources, in response to an external 
request. 

0 -> internal reference 

1 — » external reference 

Since the R4600/R4700 doesn't have any external events that would look in a cache (which is 
the only processor-managed resource), this bit would not be set under normal operating 
conditions. 

EE 

Set if ti\e error occurred on the SysAD bus. 

Taking a cache error exception sets /clears this bit. 

EB 

Set if a data error occurred in addition to the instruction error (indicated by the remainder of 
the bits). If so, this requires flushing the data cache after fixing the instruction error. 

SIdx 

Physical address 21:3 of the reference that encountered the error. 

The address may not be the same as the address of the double word in error, but it is sufficient 
to locate that double word in the secondary cache. 

PIdx 

Virtual address 13:12 of the double word in error. 

To be used with SIdx to construct a virtual index for the primary caches. Only the lower two 
bits (bits 1 and 0) are vAddr; the high bit (bit 2) is zero. 

0 

Reserved. Must be written as zeroes, and returns zeroes when read. 


Table 5.8 CacheErr Register Fields 


Error Exception Program Counter (Error EPC) Register (30) 

The ErrorEPC register is similar to the EPC register, except that ErrorEPC 
is used on parity error exceptions. It is also used to store the program 
counter (PC) on Reset, Soft Reset, and nonmaskable interrupt (NMI) 
exceptions. 

The read/write ErrorEPC register contains the virtual address at which 
instruction processing can resume after servicing an error. This address 
can be: 

• the virtual address of the instruction that caused the exception 

• the virtual address of the immediately preceding branch or jump in- 
struction, when this address is in a branch delay slot. 

There is no branch delay slot indication for the ErrorEPC register. 
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Figure 5.11 shows the format of the ErrorEPC register. 


ErrorEPC Register 

63 0 

ErrorEPC 


64 


Figure 5.11 ErrorEPC Register Format 

Processor Exceptions 

This section describes the processor exceptions— it describes the cause 
of each exception, its processing by the hardware, and servicing by a 
handler (software). The types of exception, with exception processing 
operations, are described in the next section. 

Exception Types 

This section gives sample exception handler operations for the following 
exception types: 

• reset 

• soft reset 

• nonmaskable interrupt (NMI) 

• cache error 

• remaining processor exceptions 

When the EXL bit in the Status register is 0, either User or Supervisor 
operating mode is specified by the KSU bits in the Status register. When 
the EXL bit or the ERL bit is a 1 , the processor is in Kernel mode. 

When the processor takes an exception, the EXL bit is set to 1, which 
means the system is in Kernel mode. After saving the appropriate state, the 
exception handler typically resets the EXL bit back to 0. When restoring 
the state and restarting, the handler sets the EXL bit back to 1 . 

Returning from an exception, also resets the EXL bit to 0 (see the ERET 
instruction in Appendix A). 

In the following sections, sample hardware processes for various 
exceptions are shown, together with the servicing required by the handler 
(software). 

Reset Exception Process 

Figure 5.12 shows the Reset exception process. 


T: undefined 

Random TLBENTRlESM : 

Wired 0 

Config <- 0 II EC II EP II OQOQOOOO II BE 1111011 010 II Q10 K 1 11 1 lid undefined 3 
ErrorEPC PC 

SR «- SR 3 ,.23 H 1 If 0 II 0 II SR m I1 1 II SR 1 ,p 
PC OxFFFF FFFF BFCO 0000 ' Yi ; S 


Figure 5.12 Reset Exception Processing 
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Cache Error Exception Process 

Figure 5.13 shows the Cache Error exception process. 


T: ErrorEPC PC 

•• CachaEir^ii II HT tt ES II EE II EB II Q* 5 ; 

. BB^sR^il titsR^o 

if « 1 then /* Whet is tl^ BEt/ bit setting */ 

PC <- OxFFFF FFFF BFCO 0200 + 0x100 r access bpot-PROM area 7 
else 

PC<- OxFFFF FFFF A000 0066 + 0x100 it* access main memory area */ 
; endif 


Figure 5.13 Cache Error Exception Processing 

Soft Reset and NMI Exception Process 

Figure 5.14 shows the Soft Reset and NMI exception process. 


T: ErrorEPC «- PC : 

SR <- SR 31123 If 100 1(1 It SR 19 . 3 111 II SR t . 0 
PC <- OxFFFF FFFF BFCO 0000' ~ ' ' ; 


Figure 5.14 Soft Reset and NMI Exception Processing 

General Exception Process 

Figure 5. 15 shows the process used for exceptions other than Reset, Soft 
Reset, NMI, and Cache Error. 


S' Cause m BD If 6 II CE II O 12 II Caused II O: 

U ExcCode H O 2 ’ . 

jfSR^^ O tHen /‘ system in User or Su| 

jervfsor mode with no current exception */ 

EPC «- PC ■ 


endif 


SR^rSR 3l . 2 :ll1 fISRO 


if SR 22 j=:i then /* What is : the BEV bit j 

setting 7 7: : • 

PC*- OxFFFF FFFFBFCdjl^bb + vector 

/* access to uncached space 7 

else - k L'- • 

■ 7- PC 4* OxFFFF FFFF 8000 0000 -4- vector . 

■ . endif : : - 

■ access to cached space*/ . . 


Figure 5. 15 General Exception Processing (Except Reset, Soft Reset, NMI, 

and Cache Error) 

Exception Vector Locations 

The Reset, Soft Reset, and NMI exceptions are always vectored to 
location OxFFFF FFFF BFCO 0000 (virtual address), corresponding to 
ksegO. 

Addresses for all other exceptions are a combination of a vector offset 
and a base address. The base address is determined by the BEVbit of the 
Status register, as shown in Table 5.9. 
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Table 5. 10 shows the vector offset that is added to the base address to 
create the exception address. 


BEV 

R4600/R4700 Processor Vector Base 

Cache Error Base 

0 

OxFFFF FFFF 8000 0000 

OxFFFF FFFF A000 0000 

u 

OxFFFF FFFF BFCO 0200 

OxFFFF FFFF BFCO 0200 


Table 5.9 Exception Vector Base Addresses 


As shown in Table 5.9, when BEV = 0, the vector base for the Cache 
Error exception changes from ksegO (OxFFFF FFFF 8000 0000) to ksegl 
(OxFFFF FFFF A000 0000). 

When BEV = 1, the vector base for the Cache Error exception is OxFFFF 
FFFF BFCO 0200. This is an uncached and unmapped space, allowing the 
exception to bypass the cache and TLB. 


Exception 

R4600/R4700 Processor 
Vector Offset 

TLB refill, EXL = 0 

0x000 

XTLB refill, EXL = 0 (X = 64-bit TLB) 

0x080 

Cache Error 

0x100 

Others 

0x180 


Table 5.10 Exception Vector Offsets 


Priority of Exceptions 

The remainder of this chapter describes exceptions in the order of their 
priority, as shown in Table 5. 1 1. While more than one exception can occur 
for a single instruction, only the exception with the highest priority is 
reported. 


Exception Priority 

1 

Reset (highest priority) 

9 

Integer overflow, Trap, System Call, Break- 
point, Reserved Instruction, Coprocessor 
Unusable, or Floating-Point Exception 

2 

Soft Reset 


Address error — Data access 

3 

Nonmaskable Interrupt (NMI) 

11 

TLB refill — Data access 

B 

Address error — Instruction fetch 

12 

TLB invalid — Data access 

B 

TLB refill — Instruction fetch 

13 

TLB modified — Data write 

6 

TLB invalid — Instruction fetch 

14 

Cache error — Data access 

7 

Cache error — Instruction fetch 

15 

Bus error — Data access 

8 

Bus error — Instruction fetch 

16 

Interrupt (lowest priority) 


Table 5.11 Exception Priority Order 


Generally speaking, the exceptions described in the following sections 
are handled (“processed”) by hardware; these exceptions are then serviced 
by software. 
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Reset Exception 

This section explains the Reset exception. 

Cause 

The Reset exception occurs when the CoIdReset * 1 signal is asserted and 
then deasserted. This exception is not maskable. 

Processing 

The CPU provides a special exception vector for this exception of: 
OxFFFF FFFF BFCO 0000 

The Reset vector resides in unmapped and uncached CPU address 
space, so the hardware need not initialize the TLB or the cache to process 
this exception. It also means the processor can fetch and execute 
instructions while the caches and virtual memory are in an undefined 
state. 

The contents of all registers in the CPU are undefined when this 
exception occurs, except for the following register fields: 

• In the Status register, SR is cleared to 0, and ERL and BEV are set to 
1 . All other bits are undefined. 

• The Random register is initialized to the value of its upper bound. 

• The Wired register is initialized to 0. 

• Some of the Config Register bits are initialized from the boot-time 
mode stream. 

Reset exception processing is shown in Figure 5.12 on page 12. 

Servicing 

The Reset exception is serviced by: 

• initializing all processor registers, coprocessor registers, caches, and 
the memory system 

• performing diagnostic tests 

• bootstrapping the operating system 


'■ In the following sections (and throughout this manual) a signal followed by an 
asterisk, such as Reset*, is low active. 
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Soft Reset Exception 

This section explains the Soft Reset exception. 

Cause 

The Soft Reset exception occurs in response to the Reset* input signal, 
and execution begins at the Reset vector when Reset* is deasserted. This 
exception is not maskable. 

Processing 

The Reset exception vector is used for this exception, located within 
unmapped and uncached address space so that the cache and TLB need 
not be initialized to process this exception. When a Soft Reset occurs, the 
SR bit of the Status register is set to distinguish this exception from a Reset 
exception. 

The primary purpose of the Soft Reset exception is to reinitialize the 
processor after a fatal error during normal operations. Unlike an NMI, all 
cache and bus state machines are reset by this exception. Like Reset, it 
can be used on the processor in any state; the caches, TLB, and normal 
exception vectors need not be properly initialized. Soft Reset preserves the 
state of the caches and memory system, while resetting the bus state and 
cache state machine. 

When this exception occurs, the contents of all registers are preserved 
except for: 

• ErrorEPC register, which contains the restart PC 

• ERL bit of the Status register, which is set to 1 

• SR bit of the Status register, which is set to 1 

• BEV bit of the Status register, which is set to 1 

Because the Soft Reset can abort cache and bus operations, cache and 
memory state is undefined when this exception occurs. 

Soft reset exception processing is shown in Figure 5. 14 on page 13. 

Servicing 

The Soft Reset exception is serviced by saving the current processor 
state for diagnostic purposes, and reinitializing for the Reset exception. 


5-16 





CPU Exception Processing 


Chapter 5 


Nonmaskable Interrupt (NMI) Exception 

This section explains the Nonmaskable Interrupt exception. 

Cause 

The Nonmaskable Interrupt (NMI) exception occurs in response to the 
falling edge of the NMI pin, or an external write to the Int*[6] bit of the 
Interrupt register. 

Unlike all other interrupts, this interrupt is not maskable; it occurs 
regardless of the settings of the EXL, ERL, and the IE bits in the Status 
register. 

Processing 

The Reset exception vector is used for this exception. This vector is 
located within unmapped and uncached address space so that the cache 
and TLB need not be initialized to process an NMI interrupt. When an NMI 
exception occurs, the SR bit of the Status register is set to differentiate this 
exception from a Reset exception. 

Because an NMI can occur in the midst of another exception, it is not 
normally possible to continue program execution after servicing an NMI. 

Unlike Reset and Soft Reset, but like other exceptions, NMI is taken only 
at instruction boundaries. The state of the caches and memory system are 
preserved by this exception. 

To terminate a pending read that has hung the best approach is to 
return a bus error. However, if you wish to use a CPU exception to indicate 
a hung read. Soft Reset is preferable to NMI. 

When this exception occurs, the contents of all registers are preserved 
except for: 

• ErrorEPC register, which contains the restart PC 

• ERL bit of the Status register, which is set to 1 

• SR bit of the Status register, which is set to 1 

• BEV bit of the Status register, which is set to 1 

NMI exception processing is shown in Figure 5.14 on page 13. 

Servicing 

The NMI exception is serviced by saving the current processor state for 
diagnostic purposes, and reinitializing the system for the Reset exception. 
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Address Error Exception 

This section explains the Address Error exception. 

Cause 

The Address Error exception occurs when an attempt is made to execute 
one of the following: 

• load or store a doubleword that is not aligned on a doubleword 
boundary (except for use of special instruction) 

• load, fetch, or store a word that is not aligned on a word boundary 
(except for use of special instruction) 

• load or store a halfword that is not aligned on a halfword boundary 

• reference the kernel address space from User or Supervisor mode 

• reference the supervisor address space from User mode 

This exception is not maskable. 

Processing 

The common exception vector is used for this exception. The AdEL or 
AdES code in the Cause register is set, indicating whether the instruction 
(shown by the EPC register and BD bit in the Cause register) caused the 
exception with an instruction reference, load operation, or store operation. 

When this exception occurs, the BadVAddr register retains the virtual 
address that was not properly aligned or referenced protected address 
space. The contents of the VPN field of the Context and EntnjHi registers 
are undefined, as are the contents of the EntryLo register. 

The EPC register contains the address of the instruction that caused the 
exception, unless this instruction is in a branch delay slot. If it is in a 
branch delay slot, the EPC register contains the address of the preceding 
branch instruction and the BD bit of the Cause register is set as indication. 

Address Error exception processing is shown in Figure 5. 15 on page 13. 

Servicing 

Typically the process executing at the time is handed a segmentation 
violation signed. This error is usually fatal to the process incurring the 
exception. 

To resume execution, the EPC register must be altered so that the 
unaligned reference instruction does not re-execute; this is accomplished 
by adding a value of 4 to the EPC register (EPC register + 4) before 
returning. 

If an unaligned reference instruction is in a branch delay slot, 
interpretation of the branch instruction is required to resume execution. 
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TLB Exceptions 

This section explains the TLB Exceptions. For specifics on the 
exceptions listed here, refer to the following three subsections. 

Three types of TLB exceptions can occur: 

• TLB Refill occurs when there is no TLB entry that matches an at- 
tempted reference to a mapped address space. 

• TLB Invalid occurs when a virtual address reference matches a TLB 
entry that is marked invalid. 

• TLB Modified occurs when a store operation virtual address reference 
to memory matches a TLB entry which is marked valid but is not dirty 
(the entry is not writable). 

The following three subsections describe the TLB exceptions. 

TLB Refill Exception 

This subsection explains the TLB refill exception. 

Cause 

The TLB refill exception occurs when there is no TLB entry to match a 
reference to a mapped address space. This exception is not maskable. 

Processing 

There are two special exception vectors for this exception; one for 
references to 32-bit virtual address spaces, and one for references to 64- 
bit virtual address spaces. The UX, SX, and KX bits of the Status register 
determine whether the user, supervisor or kernel address spaces 
referenced are 32-bit or 64-bit spaces. All references use these vectors 
when the EXL bit is set to 0 in the Status register. This exception sets the 
TLBL or TLBS code in the ExcCode field of the Cause register. This code 
indicates whether the instruction, as shown by the EPC register and the 
BD bit in the Cause register, caused the miss by an instruction reference, 
load operation, or store operation. 

When this exception occurs, the BadVAddr, Context, XContext and 
EntryHi registers hold the virtual address that failed address translation. 
The EntryHi register also contains the ASID from which the translation 
fault occurred. The Random register normally suggests a valid location in 
which to place the replacement TLB entry. The contents of the EntryLo 
register are undefined. The EPC register contains the address of the 
instruction that caused the exception, unless this instruction is in a 
branch delay slot, in which case the EPC register contains the address of 
the preceding branch instruction and the BD bit of the Cause register is 
set. 

TLB Refill exception processing is shown in Figure 5.15 on page 13. 

Servicing 

To service this exception, the contents of the Context or XContext register 
are used as a virtual address to fetch memory locations containing the 
physical page frame and access control bits for a pair of TLB entries. The 
two entries are placed into the EntryLoO/ EntryLo 1 register; the EntryHi 
and EntryLo registers are written into the TLB. 

It is possible that the virtual address used to obtain the physical address 
and access control information is on a page that is not resident in the TLB. 
This condition is processed by allowing a TLB refill exception in the TLB 
refill handler. This second exception goes to the common exception vector 
because the EXL bit of the Status register is set. 
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TLB Invalid Exception 

This subsection explains the TLB invalid exception. 

Cause 

The TLB invalid exception occurs when a virtual address reference 
matches a TLB entry that is marked invalid (TLB valid bit cleared). This 
exception is not maskable. 

Processing 

The common exception vector is used for this exception. The TLBL or 
TLBS code in the EXcCode field of the Cause register is set. This indicates 
whether the instruction, as shown by the EPC register and BD bit in the 
Cause register, caused the miss by an instruction reference, load 
operation, or store operation. 

When this exception occurs, the BadVAddr, Context, XContext and 
EntryHi registers contain the virtual address that failed address 
translation. The EntryHi register also contains the ASID from which the 
translation fault occurred. The Random register normally contains a valid 
location in which to put the replacement TLB entry. The contents of the 
Entry Lo registers are undefined. 

The EPC register contains the address of the instruction that caused the 
exception unless this instruction is in a branch delay slot, in which case 
the EPC register contains the address of the preceding branch instruction 
and the BD bit of the Cause register is set. 

TLB Invalid exception processing is shown in Figure 5.15 on page 13. 

Servicing 

A TLB entry is typically marked invalid when one of the following is true: 

• a virtual address does not exist 

• the virtual address exists, but is not in main memory (a page fault) 

• a trap is desired on any reference to the page (for example, to main- 
tain a reference bit or during debug) 

After servicing the cause of a TLB Invalid exception, the TLB entry is 
located with TLBP (TLB Probe), and replaced by an entry with that entry’s 
Valid bit set. 
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TLB Modified Exception 

This subsection explains the TLB modified exception. 

Cause 

The TLB modified exception occurs when a store operation virtual 
address reference to memoiy matches a TLB entry that is marked valid but 
is not dirty and therefore is not writable. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the Mod 
code in the Cause register is set. 

When this exception occurs, the BadVAddr, Context, XContext and 
EntryHi registers contain the virtual address that failed address 
translation. The EntryHi register also contains the ASID from which the 
translation fault occurred. The contents of the Entry Lo registers are 
undefined. 

The EPC register contains the address of the instruction that caused the 
exception unless that instruction is in a branch delay slot, in which case 
the EPC register contains the address of the preceding branch instruction 
and the BD bit of the Cause register is set. 

TLB Modified exception processing is shown in Figure 5.15 on page 13. 

Servicing 

The kernel uses the failed virtual address or virtual page number to 
identify the corresponding access control information. The page identified 
may or may not permit write accesses; if writes are not permitted, a write 
protection violation occurs. 

If write accesses are permitted, the page frame is marked dirty /writable 
by the kernel in its own data structures. The TLBP instruction places the 
index of the TLB entry that must be altered into the Index register. The 
Entryho register is loaded with a word containing the physical page frame 
and access control bits (with the D bit set), and the EntryHi and EntryLo 
registers are written into the TLB. 
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Cache Error Exception 

This section explains the Cache Error exception. 

Cause 

The Cache Error exception occurs when a primary cache parity error is 
detected. This exception is maskable by the DE bit of the Status register. 

Processing 

The processor sets the ERL bit in the Status register, saves the exception 
restart address in ErrorEPC register, and then transfers to a special vector 
in uncached space: 

If the BEV bit = 0, the vector is OxFFFF FFFF AOOO 0100. 

If the BEV bit = 1, the vector is OxFFFF FFFF BFCO 0300. 

No other registers are changed. 

Cache Error exception processing is shown in Figure 5. 13 on page 13. 

Servicing 

All errors should be logged. To correct cache parity errors the system 
uses the CACHE instruction to invalidate the cache block, overwrites the 
old data through a cache miss, and resumes execution with an ERET. 

Other errors are not correctable and are likely to be fatal to the current 
process. 
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Bus Error Exception 

This section explains the Bus Error exception. 

Cause 

A Bus Error exception is raised by board-level circuitry for events such 
as bus time-out, backplane bus parity errors, and invalid physical memory 
addresses or access types. This exception is not maskable. 

A Bus Error exception occurs only when a cache miss refill, uncached 
reference, or unbuffered write occurs synchronously; a Bus Error 
exception resulting from a buffered write transaction must be reported 
using the general interrupt mechanism. 

Processing 

The common interrupt vector is used for a Bus Error exception. The IBE 
or DBE code in the ExcCode field of the Cause register is set, signifying 
whether the instruction (as indicated by the EPC register and BD bit in the 
Cause register) caused the exception by an instruction reference, load 
operation, or store operation. 

The EPC register contains the address of the instruction that caused the 
exception, unless it is in a branch delay slot, in which case the EPC register 
contains the address of the preceding branch instruction and the BD bit of 
the Cause register is set. Bus Error processing is shown in Figure 5. 15 on 
page 13. 

Servicing 

The physical address at which the fault occurred can be computed from 
information available in the CPO registers. 

• If the IBE code in the Cause register is set (indicating an instruction 
fetch reference), the virtual address is contained in the EPC register. 

• If the DBE code is set (indicating a load or store reference), the in- 
struction that caused the exception is located at the virtual address 
contained in the EPC register (or 4+ the contents of the EPC register 
if the BD bit of the Cause register is set). 

The virtual address of the load and store reference can then be obtained 
by interpreting the instruction. The physical address can be obtained by 
using the TLBP instruction and reading the EntryLo register to compute 
the physical page number. 

The process executing at the time of this exception is handed a bus error 
signal, which is usually fatal. 
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Integer Overflow Exception 

This section explains the Integer Overflow exception. 

Cause 

An Integer Overflow exception occurs when an ADD, ADDI, SUB, DADD, 
DADDI or DSUB 1 instruction results in a 2’s complement overflow. This 
exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the OV 
code in the Cause register is set. 

The EPC register contains the address of the instruction that caused the 
exception unless the instruction is in a branch delay slot, in which case 
the EPC register contains the address of the preceding branch instruction 
and the BD bit of the Cause register is set. 

Integer Overflow exception processing is shown in Figure 5.15 on 
page 13. 

Servicing 

The process executing at the time of the exception is handed a floating- 
point exception/integer overflow signal. This error is usually fatal to the 
current process. 


’■ See Appendix A for instruction description. 
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Trap Exception 

This section explains the Trap exception. 

Cause 

The Trap exception occurs when a TGE, TGEU, TLT, TLTU, TEQ, TNE, 
TGE1, TGEUI, TLTI, TLTU1, TEQI, or TNEI 1 instruction results in a TRUE 
condition. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the TV code 
in the Cause register is set. 

The EPC register contains the address of the instruction causing the 
exception unless the instruction is in a branch delay slot, in which case 
the EPC register contains the address of the preceding branch instruction 
and the BD bit of the Cause register is set. 

Trap exception processing is shown in Figure 5.15 on page 13. 

Servicing 

The process executing at the time of a Trap exception is handed a 
floating-point exception/integer overflow signal. This error is usually fatal. 


*• See Appendix A for instruction description. 
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System Call Exception 

This section explains the System Call exception. 

Cause 

A System Call exception occurs during an attempt to execute the 
SYSCALL instruction. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the Sys 
code in the Cause register is set. 

The EPC register contains the address of the SYSCALL instruction 
unless it is in a branch delay slot, in which case the EPC register contains 
the address of the preceding branch instruction. 

If the SYSCALL instruction is in a branch delay slot, the BD bit of the 
Status register is set; otherwise this bit is cleared. 

System Call exception processing is shown in Figure 5. 15 on page 13. 

Servicing 

When this exception occurs, control is transferred to the applicable 
system routine. 

To resume execution, the EPC register must be altered so that the 
SYSCALL instruction does not re-execute; this is accomplished by adding 
a value of 4 to the EPC register ( EPC register + 4) before returning. 

If a SYSCALL instruction is in a branch delay slot, a more complicated 
algorithm, beyond the scope of this description, may be required. 
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Breakpoint Exception 

This section explains the Breakpoint exception. 

Cause 

A Breakpoint exception occurs when an attempt is made to execute the 
BREAK instruction. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the BP code 
in the Cause register is set. 

The EPC register contains the address of the BREAK instruction unless 
it is in a branch delay slot, in which case the EPC register contains the 
address of the preceding branch instruction. 

If the BREAK instruction is in a branch delay slot, the BD bit of the 
Status register is set, otherwise the bit is cleared. 

Breakpoint exception processing is shown in Figure 5.15 on page 13. 

Servicing 

When the Breakpoint exception occurs, control is transferred to the 
applicable system routine. Additional distinctions can be made by 
analyzing the unused bits of the BREAK instruction (bits 25:6), and 
loading the contents of the instruction whose address the EPC register 
contains. A value of 4 must be added to the contents of the EPC register 
(EPC register + 4) to locate the instruction if it resides in a branch delay 
slot. 

To resume execution, the EPC register must be altered so that the 
BREAK instruction does not re-execute; this is accomplished by adding a 
value of 4 to the EPC register ( EPC register + 4) before returning. 

If a BREAK instruction is in a branch delay slot, interpretation of the 
branch instruction is required to resume execution. 
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Reserved Instruction Exception 

This section explains the Reserved Instruction exception. 

Cause 

The Reserved Instruction exception occurs when one of the following 
conditions occurs: 

• an attempt is made to execute an instruction with an undefined major 
opcode (bits 31:26) 

• an attempt is made to execute a SPECIAL instruction with an unde- 
fined minor opcode (bits 5:0) 

• an attempt is made to execute a REGIMM instruction with an unde- 
fined minor opcode (bits 20:16) 

• an attempt is made to execute 64-bit operations in 32-bit virtual ad- 
dressing when in User or Supervisor modes 

64-bit operations are always valid in Kernel mode regardless of the value 
of the KX bit in the Status register. 

This exception is not maskable. 

Reserved Instruction exception processing is shown in Figure 5.15 on 
page 13. 

Processing 

The common exception vector is used for this exception, and the RI code 
in the Cause register is set. 

The EPC register contains the address of the reserved instruction unless 
it is in a branch delay slot, in which case the EPC register contains the 
address of the preceding branch instruction. 

Servicing 

No instructions in the MIPS ISA are currently interpreted. The process 
executing at the time of this exception is handed an illegal instruction/ 
reserved operand fault signal. This error is usually fatal. 
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Coprocessor Unusable Exception 

This sections explains the Coprocessor Unusable exception. 

Cause 

The Coprocessor Unusable exception occurs when an attempt is made 
to execute a coprocessor instruction for either: 

• a corresponding coprocessor unit that has not been marked usable, 
or 

• CPO instructions, when the unit has not been marked usable and the 
process executes in User mode. 

This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the CPU 
code in the Cause register is set. The contents of the Coprocessor Usage 
Error field of the coprocessor Control register indicate which of the four 
coprocessors was referenced. The EPC register contains the address of the 
unusable coprocessor instruction unless it is in a branch delay slot, in 
which case the EPC register contains the address of the preceding branch 
instruction. 

Coprocessor Unusable exception processing is shown in Figure 5.15 on 
page 13. 

Servicing 

The coprocessor unit to which an attempted reference was made is 
identified by the Coprocessor Usage Error field, which results in one of the 
following situations: 

• If the process is entitled access to the coprocessor, the coprocessor is 
marked usable and the corresponding user state is restored to the co- 
processor. 

• If the process is entitled access to the coprocessor, but the coproces- 
sor does not exist or has failed, interpretation of the coprocessor in- 
struction is possible. 

• If the BD bit is set in the Cause register, the branch instruction must 
be interpreted; then the coprocessor instruction can be emulated and 
execution resumed with the EPC register advanced past the coproces- 
sor instruction. 

• If the process is not entitled access to the coprocessor, the process ex- 
ecuting at the time is handed an illegal instruction/privileged instruc- 
tion fault signal. This error is usually fatal. 
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Floating-Point Exception 

This sections explains the Floating-Point exception. 

Cause 

The Floating-Point exception is used by the floating-point coprocessor. 
This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the FPE 
code in the Cause register is set. 

The contents of the Floating-Point Control/ Status register indicate the 
cause of this exception. 

Floating-Point exception processing is shown in Figure 5. 15 on page 13. 

Servicing 

This exception is cleared by clearing the appropriate bit in the Floating- 
Point Control/ Status register. 

For an unimplemented instruction exception, the kernel should emulate 
the instruction; for other exceptions, the kernel should pass the exception 
to the user program that caused the exception. 
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Interrupt Exception 

This sections explains the Interrupt exception. 

Cause 

The Interrupt exception occurs when one of the eight interrupt 
conditions is asserted. The significance of these interrupts is dependent 
upon the specific system implementation. 

Each of the eight interrupts can be masked by clearing the 
corresponding bit in the Int-Mask field of the Status register, and all of the 
eight interrupts can be masked at once by clearing the IE bit of the Status 
register. 

Processing 

The common exception vector is used for this exception, and the Int code 
in the Cause register is set. 

The IP field of the Cause register indicates current interrupt requests. It 
is possible that more than one of the bits can be simultaneously set (or 
even no bits may be set if the interrupt is asserted and then deasserted 
before this register is read). 

Interrupt exception processing is shown in Figure 5.15 on page 13. 

Servicing 

If the interrupt is caused by one of the two software-generated 
exceptions ( SW1 or SWO), the interrupt condition is cleared by setting the 
corresponding Cause register bit to 0. 

If the interrupt is hardware-generated, the interrupt condition is cleared 
by correcting the condition causing the interrupt pin to be asserted. 

NOTE: due to the write buffer, a store to an external device will not 
necessarily occur until after other instructions in the pipeline finish. Thus, 
the user must ensure that the store will occur before the return from 
exception instruction (ERET) is executed otherwise the interrupt may be 
serviced again even though there should be no interrupt pending. 
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Exception Handling and Servicing Flowcharts 

The remainder of this chapter contains figures of flowcharts for the 
exceptions described in Table 5.12, and guidelines for their handlers. 


Figure 

Description 

Figure 5.16, 
Figure 5.17 

General exceptions and their exception handler 

Figure 5. 18, 
Figure 5.19 

TLB/XTLB miss exception and their exception handler 

Figure 5.20 

Cache error exception and its handler 

Figure 5.21 

Reset, soft reset and NMI exceptions, and a guideline to 
their handler. 


Table 5. 12 List of Exception Flowcharts 


Generally speaking, the exceptions are handled by hardware (HW), and 
then the exceptions are serviced by software (SW). 
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Comments 


Set FP Control Status Register *pp control Status Register is only set 


Enhi 4- VPN2, ASID 
Context 4- VPN2 
Set Cause Register 
EXCCode, CE 


if the respective exception occurs. 
EnHi, X/Context are set only for 
TLB- Invalid, Modified, 

& Refill exceptions 


Yes Instr. in\ 
<T Br.Dly. Slot?, 


Cause 31 (BD) 4- 1 Cause 31 (BD) 4 - 0 


Check if exception within 
another exception 


Set BadVA 

Set BadVA 

EPC 4 - (PC - 4) 

EPC 4 - PC 


BadVA is set only for 
TLB- Invalid, Modified, 

Refill- and VCED/I exceptions 
Note: not set if Bus Error 
Exception 


EXL 4 - 1 


Processor forced to Kernel Mode 
& interrupt disabled 


=0 (normal) 


=1 (bootstrap) 


(Base is sign extended for 64 bits) 


PC OxFFFF FFFF 8000 0000 
+ 180 

(unmapped, cached) 


PC 4- OxFFFF FFFF BFC0 020< 
+ 180 

(unmapped, uncached) 


To General Exception Servicing Guidelines 


Exceptions other than Reset, Soft Reset, NMI, CacheErr or first-level TLB miss 
Note: Interrupts can be masked by IE or IMs 


Figure 5.16 General Exception Handler (HW) 
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Comments 


MFCO - 

X/Context 

EPC 

Status 

Cause 


* Unmapped vector so TLBMod, TLBInv, 
TLB Refill exceptions not possible 

* EXL=1 so Interrupt exceptions disabled 

* OS/System to avoid all other exceptions 

*Only CacheErr, Reset, Soft Reset, NMI 
exceptions possible. 


MTCO - 

(Set Status Bits:) 
KSU <- 00 
EXL <— 0 
& IE=1 


(optional - only to enable Interrupts while keeping Kernel Mode) 


Check CAUSE REG. & Jump to 
appropriate Service Code 


' After EXL=0, all exceptions allowed, 
(except interrupt if masked by IE or IM 
and CacheErr if masked by DE) 


Service Code 



* ERET is not allowed in the branch delay slot of 
another Jump Instruction 

* Processor does not execute the instruction which is 
in the ERETs branch delay slot 

* PC 4 - EPC; EXL 0 

* LLbit < — 0 


Figure 5.17 General Exception Servicing Guidelines (SW) 
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Enhi 4 - VPN2, ASID 
Context 4- VPN2 
Set Cause Reg. 
EXCCode, CE and 
Cause bit 31 (BD) 4- 1 


^EXl> 
(SR bit 1) 


Enhi 4- VPN2, ASID 
Context 4- VPN2 
Set Cause Reg. 

EXCCode, CE and 
Cause bit 31 (BD) 4- 0 


Check if exception within 
another exception 


EXL\^ =1 
(SR bit 1) i > 


Set BadVA 

Set BadVA 

EPC 4 - (PC - 4) 

EPC 4- PC 


^XTLB\ 

Instruction? 



Vec. Off. = 0x080 
1 


Points to Refill Exception 


Vec. Off. - 0x000 


Vec. Off. = 0x180 


EXL 4- 1 


Points to General Exception 

Processor forced to Kernel Mode & 
interrupt disabled 


=0 (normal) 


^BEV\ 
(SR bit 22) 


=1 (bootstrap) 


j (Base is sign extended for 64 bits) 

PC 4- OxFFFF FFFF 8000 0000 pq q; 

♦ Vec.Off. 

(unmapped, cached) (un 


PC 4- OxFFFF FFFF BFC0 0200 
+ Vec.Off. 

(unmapped, uncached) 


To TLB/XTLB Exception Servicing Guidelines 


Figure 5.18 TLB/XTLB Miss Exception Handler (HW) 
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Comments 




* Unmapped vector so TLBMod, TLBInv, 
TLB Refill or VCEP exceptions 

not possible 

* EXL=1 so Interrupt exceptions disabled 

* OS/System to avoid all other exceptions 

*Only CacheErr, Reset, Soft Reset, NMI 
exceptions possible. 


* Load the mapping of the virtual address in Context Reg. 
Move it to ENLO and Write into the TLB 

* There could be a TLB miss again during the mapping 
of the data or instruction address. The processor will 
jump to the general exception vector since the EXL is 1 . 
(Option to complete the first level refill in the general 
exception handler or ERET to the original instruction 
and take the exception again) 


* ERET is not allowed in the branch delay slot of 
another Jump Instruction 

* Processor does not execute the instruction which is 
in the ERET’s branch delay slot 

* PC EPC; EXL <- 0 

* LLbit <r- 0 


e 5.19 TLB/XTLB Exception Servicing G 
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Note: Can be masked/disabled by DE (SRI 6) bit = 1 



Figure 5.20 Cache Error Exception Handling (HW) 


and Servicing Guidelines (SW) 
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Soft Reset or NMI Exception Reset Exception 



Figure 5.21 Reset, Soft Reset ft NMI Exception Handling (HW) and Servicing 

Guidelines (SW) 
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This chapter describes the R4600 and R4700 floating-point unit (FPU) 
features, including the programming model, instruction set and formats, 
and the pipeline. 

The FPU, with associated system software, fully conforms to the 
requirements of ANSI/IEEE Standard 754-1985, IEEE Standard for Binary 
Floating-Point Arithmetic. In addition, the MIPS architecture fully supports 
the recommendations of the standard and precise exceptions. 

Overview 

The FPU operates as a coprocessor for the CPU (it is assigned 
coprocessor label CPI), and extends the CPU instruction set to perform 
arithmetic operations on floating-point values. 

The R4600/R4700 Floating-Point Coprocessor 

The R4600/R4700 incorporates an entire floating-point coprocessor on 
chip, including a floating-point register file and execution units. The 
floating-point coprocessor forms a seamless interface with the integer unit, 
decoding and executing instructions in parallel with the integer unit. In 
comparison to the R4600, the floating point coprocessor of the R4700 has 
improved floating multiply operations. 

The R4600/R4700 uses the floating-point unit to perform integer 
multiply and divide, and results are placed in the HI and LO registers. The 
values can then be transferred to the general purpose register file using the 
MFHI/MFLO instructions. The R4700 performs an integer multiply faster 
than the R4600 by 2 clock cycles, but it takes the same number of clock 
cycles for integer division. The R4700 improves the multiply compared to 
the R4600 by performing a single-precision multiply in 4 clock cycles, and 
a double-precision multiply in 5 clock cycles. 

Figure 6. 1 illustrates the functional organization of the FPU. 
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FPU Features 

This section briefly describes the operating model, the load/store 
instruction set, and the coprocessor interface in the FPU. A more detailed 
description is given in the sections that follow. 

• Full 64-bit Operation. When the FR bit in the CPU Status register 
equals 0, the FPU is configured for sixteen 64-bit registers for double- 
precision values or thirty-two 32-bit registers for single-precision val- 
ues. When the FR bit in the CPU Status register equals 1, the FPU is 
configured for thirty-two 64-bit registers. Each register can hold sin- 
gle- or double-precision values. The FPU also includes a 32-bit Con- 
trol/Status register that provides access to all IEEE-Standard 
exception handling capabilities. 

• Load and Store Instruction Set. Like the CPU, the FPU uses a load- 
and store-oriented instruction set, with single-cycle load and store 
operations. Overlap of multiply and add is supported. 

• Tightly Coupled Coprocessor Interface. The FPU resides on-chip to 
form a tightly coupled unit with a seamless integration of floating- 
point and fixed-point instruction sets. 

FPU Programming Model 

This section describes the set of FPU registers and their data 
organization. The FPU registers include Floating-Point General Purpose 
registers (FGRs) and two control registers: Control/ Status and 
Implementation/ 'Revision 

Floating-Point General Registers (FGRs) 

The FPU has a set of Floating-Point General Purpose registers (FGRs) that 
can be accessed in the following ways: 

• As 32 general-purpose registers (32 FGRs), each of which is 32-bits 
wide when the FR bit in the CPU Status register equals 0; or as 32 gen- 
eral-purpose registers (32 FGRs), each of which is 64-bits wide when 
FR equals 1. The CPU accesses these registers through move, load, 
and store instructions. 

• As 16 floating-point registers (see the next section for a description of 
FPRs), each of which is 64-bits wide, when the FR bit in the CPU Sta- 
tus register equals 0. The FPRs hold values in either single- or double- 
precision floating-point format. Each FPR corresponds to adjacently 
numbered FGRs as shown in Figure 6.2 on page 6-3. 

• As 32 floating-point registers (see the next section for a description of 
FPRs), each of which is 64-bits wide, when the FR bit in the CPU Sta- 
tus register equals 1. The FPRs hold values in either single- or double- 
precision floating-point format. Each FPR corresponds to an FGR as 
shown in Figure 6.2. 
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Floating-Point 

Floating-Point 

Floating-Point 

Floating-Point 

| Registers (FPR) General Purpose Registers Registers (FPR) General Purpose Registers 

(FR 

= 0) 


31 (F Q R) 0 

(FR = 

1) 

63 (PGR) o 

FPRO 

f (least) 

FGRO 


FPRO 

FGRO 


\ (most) 

FGR1 


FPR1 

FGR1 


FPR2 

/ (least) 

FGR2 


FPR2 

FGR2 


[ (most) 

FGR3 


FPR3 

FGR3 


• 

• 

• 



• 

• 

• 


• 

• 

• 

• 

• 

• 


FPR28 * 

f (least) 

FGR28 

FPR28 

FGR28 


\ (most) 

FGR29 

FPR29 

FGR29 


FPR30 - 

/ (least) 

FGR30 

FPR30 

FGR30 



[ (most) 

FGR31 

FPR31 

FGR31 






Floating-Point 







Control Registers 





Control/Status Register 


mplementation/Revision Register 


31 

FCR31 

0 

31 

FCRO 0 






i 









Figure 6.2 FPU Registers 


Floating-Point Registers 

The FPU provides: 

• 16 Floating-Point registers (FPRs) for Status. FR = 0, or 

• 32 Floating-Point registers [FPRs) for Status. FR = 1. 

These 64-bit registers hold floating-point values during floating-point 
operations and are physically formed from the General Purpose registers 
[FGRs). When the F. R bit in the Status register equals 1, the FPR references 
a single 64-bit FGR. 

The FPRs hold values in either single- or double-precision floating-point 
format. If the FR bit equals 0, only even numbers (the least register, as 
shown in Figure 6.2) can be used to address FPRs. When the FR bit is set 
to a 1, all FPR register numbers are valid. 

If the FRbit equals 0 during a double-precision floating-point operation, 
the general registers are accessed in double pairs. Thus, in a double- 
precision operation, selecting Floating-Point Register 0 ( FPRO ) actually 
addresses adjacent Floating-Point General Purpose registers FGRO and 
FGR1. 

Floating-Point Control Registers 

The FPU has 32 control registers (FCRs) that can only be accessed by 
move operations. The FCRs are described below: 

• The Implementation/ Revision register (FCRO) holds revision informa- 
tion about the FPU. 

• The Control/ Status register (FCR31) controls and monitors excep- 
tions, holds the result of compare operations, and establishes round- 
ing modes. 

• FCR1 to FCR30 are reserved. 


6-3 






Floating-Point Unit 


Chapter 6 


Table 6. 1 lists the assignments of the FCRs. 


PCR Number 

Use 

FCRO 

Coprocessor implementation and revision register 

FCR1 to FCR30 

Reserved 

FCR31 

Rounding mode, cause, trap enables, and flags 


Table 6.1 Floating-Point Control Register Assignments 


Implementation and Revision Register, (FCRO) 

The read-only Implementation and Revision register {FCRO) specifies the 
implementation and revision number of the FPU. This information can 
determine the coprocessor revision and performance level, and can also be 
used by diagnostic software. 

Figure 6.3 shows the layout of the register; Table 6.2, which follows the 
figure, describes the Implementation and Revision register {FCRO) fields. 



Figure 6.3 Implementation/Revision Register 


Field 

Description 

Imp 

Implementation number R4600: 0x20 

R4700: 0x21 

Rev 

Revision number in the form of y.x 

0 

Reserved. 


Table 6.2 FCRO Fields 


The revision number is a value of the form y.x, where: 

• y is a major revision number held in bits 7:4. 

• a: is a minor revision number held in bits 3:0. 

The revision number distinguishes some chip revisions; however, there 
is no guarantee that changes to the chip are necessarily reflected by the 
revision number, or that changes to the revision number necessarily reflect 
real chip changes. For this reason revision number values are not listed, 
and software should not rely on the revision number to characterize the 
chip. 

Control/Status Register (FCR31) 

The Control/ Status register (FCR31) contains control and status 
information that can be accessed by instructions in either Kernel or User 
mode. FCR31 also controls the arithmetic rounding mode and enables 
User mode traps, as well as identifying any exceptions that may have 
occurred in the most recently executed instruction, along with any 
exceptions that may have occurred without being trapped. 
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Figure 6.4 on page 6-5 shows the format of the Control/ Status register, 
and Table 6.3, which follows the figure, describes the Control/ Status 
register fields. Figure 6.5 on page 6-5 shows the Control/ Status register 
Cause, Flag, and Enable fields. 


31 


Control/Status Register (FCR31) 

25 24 23 22 18 17 12 11 


7 6 


2 1 


FS 


Cause 

EVZOU 


Enables 
VZOU I 


Flags 
VZOU I 


RM 


Figure 6.4 FP Control/Status Register Bit Assignments 


Field 


Description 


FS 


When set, denormalized results are flushed to 0 instead of causing 
an unimplemented operation exception. 


Condition bit. See description of Control/ Status register Condition 
bit. 


Cause 


Cause bits. See Figure 6.5 and the description of Control/ Status 
register Cause , Flag , and Enable bits. 


Enables 


Enable bits. See Figure 6.5 and the description of Control/ Status 
register Cause, Flag, and Enable bits. 


Flags 


Flag bits. See Figure 6.5 and the description of Control/ Status reg- 
ister Cause, Flag, and Enable bits. 


RM 


Rounding mode bits. See Table 6.4 on page 7 and the description 
of Control/ Status register Rounding Mode Control bits. 


Table 6.3 Control/Status Register Fields 



Figure 6.5 Control/Status Register Cause, Flag, and Enable Fields 
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Accessing the Control/Status Register 

When the Control/ Status register Is read by a Move Control From 
Coprocessor 1 (CFC1) instruction, all unfinished instructions in the 
pipeline are completed before the contents of the register are moved to the 
main processor. If a floating-point exception occurs as the pipeline 
empties, the FP exception is taken and the CFC1 instruction is re-executed 
after the exception is serviced. 

The bits in the Control/ Status register can be set or cleared by writing to 
the register using a Move Control To Coprocessor 1 (CTC1) instruction. 
CTC1 is not issued until all previous floating-point operations are 
complete. 

IEEE Standard 754 

IEEE Standard 754 specifies that floating-point operations detect 
certain exceptional cases, raise flags, and can invoke an exception handler 
when an exception occurs. These features are implemented in the MIPS 
architecture with the Cause, Enable, and Flag fields of the Control/ Status 
register. The Flag bits implement IEEE 754 exception status flags, and the 
Cause and Enable bits implement exception handling. 

Control/Status Register FS Bit 

When the FS bit is set, denormalized results are flushed to 0 instead of 
causing an unimplemented operation exception. 

Control/Status Register Condition Bit 

When a floating-point Compare operation takes place, the result is 
stored at bit 23, the Condition bit, to save or restore the state of the 
condition line. The C bit is set to 1 if the condition is true; the bit is cleared 
to 0 if the condition is false. Bit 23 is affected only by compare and Move 
Control To FPU instructions. 

Control/Status Register Cause, Flag, and Enable Fields 

Figure 6.5 on page 6-5 illustrates the Cause, Flag, and Finable fields of 
the Control/ Status register. 

Cause Bits 

Bits 17:12 in the Control/ Status register contain Cause bits, as shown 
in Figure 6.5 on page 6-5, which reflect the results of the most recently 
executed instruction. The Cause bits are a logical extension of the CPO 
Cause register; they identify the exceptions raised by the last floating-point 
operation and raise an interrupt or exception if the corresponding enable 
bit is set. If more than one exception occurs on a single instruction, each 
appropriate bit is set. 

The Cause bits are written by each floating-point operation (but not by 
load, store, or move operations). The Unimplemented Operation (E) bit is 
set to a 1 if software emulation is required, otherwise it remains 0. The 
other bits are set to 0 or 1 to indicate the occurrence or non-occurrence 
(respectively) of an IEEE 754 exception. 

Wien a floating-point exception is taken, no results are stored, and the 
only state affected is the Cause bits. Exceptions caused by an immediately 
previous floating-point operation can be determined by reading the Cause 
field. 

Enable Bits 

A floating-point operation that sets an enabled Cause bit forces an 
immediate exception, as does setting both Cause and Enable bits with 
CTC1. The floating-point exception or interrupt is enabled when the 
corresponding enable be is set. 

There is no enable for Unimplemented Operation (£3. Setting 
Unimplemented Operation always generates a floating-point exception. 


6-6 





Floating-Point Unit 


Chapter 6 


Before returning from a floating-point exception, or doing a CTC1, 
software must first clear the enabled Cause bits to prevent a repeat of the 
interrupt. Thus, User mode programs can never observe enabled Cause 
bits set; if this information is required in a User mode handler, it must be 
passed somewhere other than the Status register. 

For a floating-point operation that sets only unenabled Cause bits, no 
exception occurs and the default result defined by IEEE 754 is stored. In 
this case, the exceptions that were caused by the immediately previous 
floating-point operation can be determined by reading the Cause field. 

Flag Bits 

When an exception case is detected and the exception Enable is not set, 
the corresponding flag bit is set. If an exception is taken, none of the flag 
bits are modified. Note however that system software may set the flag bits 
before invoking a user exception handler. 

The Flag bits are cumulative and indicate that an exception was raised 
by an operation that was executed since they were explicitly reset. Flag bits 
are set to 1 if an IEEE 754 exception is raised, otherwise they remain 
unchanged. The Flag bits are never cleared as a side effect of floating-point 
operations; however, they can be set or cleared by writing a new value into 
the Status register, using a Move To Coprocessor Control instruction. 

Control/Status Register Rounding Mode Control Bits 

Bits 1 and 0 in the Control/ Status register constitute the Rounding Mode 
(RM) field. 

As shown in Table 6.4, these bits specify the rounding mode that the 
FPU uses for all floating-point operations. 


Rounding 

Mode RM(1:0) 

Mnemonic 

Description 

0 

RN 

Round result to nearest representable value; 
round to value with least-significant bit 0 when 
the two nearest representable values are equally 
near. 

1 

RZ 

Round toward 0: round to value closest to and not 
greater in magnitude than the infinitely precise 
result. 

2 

RP 

Round toward +~>: round to value closest to and 
not less than the infinitely precise result. 

3 

RM 

Round toward - ©°: round to value closest to and 
not greater than the infinitely precise result. 


Table 6.4 Rounding Mode Bit Decoding 


Floating-Point Formats 

The FPU performs both 32-bit (single-precision) and 64-bit (double- 
precision) IEEE standard floating-point operations. The 32-bit single- 
precision format has a 24-bit signed-magnitude fraction field (f+s) and an 
8-bit exponent (e), as shown in Figure 6.6. 


31 

30 23 

22 

0 

s 

e 

f 

i 

Sign 

Exponent 

Fraction 


1 

8 

23 


Figure 6.6 Single-Precision Floating-Point Format 
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The 64-bit double-precision format has a 53-bit signed-magnitude 
fraction field (f+s) and an 1 1-bit exponent, as shown in Figure 6.7. 


63 

62 

52 

51 

0 

■ 

S 


e 

t 

■■m 

■ 

Sign 


Exponent 

Fraction 


1 

11 

52 


Figure 6.7 Double-Precision Floating-Point Format 


As shown in the above figures, numbers in floating-point format are 
composed of three fields: 

• sign field, s 

• biased exponent, e = E + bias 

• fraction, f =.b J b 2 ....b p _ J 

The range of the unbiased exponent E includes every integer between 
the two values E mln and E max inclusive, together with two other reserved 
values: 

• Emjn -1 (to encode ±0 and denormalized numbers) 

• E may +1 (to encode ±* and NaNs [Not a Number]) 

For single- and double-precision formats, each representable nonzero 
numerical value has just one encoding. 

For single- and double-precision formats, the value of a number, v, is 
determined by the equations shown in Table 6.5. 


No. 

Equation 

a) 

if E = E max +1 and f * 0, then 1 / is NaN, regardless of s 

(2) 

it E = E max +1 and f = 0, then v= (-1) s ~ 

(3) 

if E min SES E max , then y= (-1) S 2 E (1./) 

(4) 

if E = E min — 1 and f * 0, then i/= (-1 ) s 2 Emin (0./) 

(5) 

if E = E min — 1 and f = 0, then v = (-1 ) s 0 


Table 6.5 Equations for Calculating Values in Single and 
Double-Precision Floating-Point Format 


For all floating-point formats, if v is NaN, the most-significant bit of f 
determines whether the value is a signaling or quiet NaN: v is a signaling 
NaN if the most-significant bit of/is set, otherwise, v is a quiet NaN. 
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defines the values for the format parameters. Minimum and maximum 
floating-point values are given in Table 6.7. 


Parameter 

Format 

Single 

Double 

f 

24 

53 

Emax 

+ 127 

+ 1023 

Emin 

-126 

-1022 

Exponent bias 

+ 127 

+ 1023 

Exponent width in bits 

8 

11 

Integer bit 

hidden 

hidden 

Fraction width in bits 

24 

53 

Format width in bits 

32 

64 


Table 6.6 Floating-Point Format Parameter Values 


Type 

Value 

Float Minimum 

1 .40 129846e-45 

Float Minimum Norm 

1 . 1 7549435e-38 

Float Maximum 

3.40282347e+38 

Double Minimum 

4.9406564584 124654e-324 

Double Minimum Norm 

2 .22507385850720 1 4e-308 

Double Maximum 

1 . 797693 1 348623 1 57e+308 


Table 6.7 Minimum and Maximum Floating-Point Values 


Binary Fixed-Point Format 

Binary fixed-point values are held in 2’s complement format. Unsigned 
fixed-point values are not directly provided by the floating-point 
instruction set. Figure 6.8 illustrates binary fixed-point format; Table 6.8, 
which follows the figure, lists the binary fixed-point format fields. 



Figure 6.8 Binary Fixed-Point Format 


Field 

Description 

sign 

sign bit 

integer 

integer value 


Table 6.8 Binary Fixed-Point Format Fields 
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Floating-Point Instruction Set Overview 

All FPU instructions are 32-bits long, aligned on a word boundaiy. They 
can be divided into the following groups: 

• Load, Store, and Move instructions move data between memory, the 
main processor, and the FPU General Purpose registers. 

• Conversion instructions perform conversion operations between the 
various data formats. 

• Computational instructions perform arithmetic operations on float- 
ing-point values in the FPU registers. 

• Compare instructions perform comparisons of the contents of regis- 
ters and set a conditional bit based on the results. 

• Branch on FPU Condition instructions perform a branch to the spec- 
ified target if the specified coprocessor condition is met. 

Table 6.9 through Table 6.12 list the instruction set of the FPU. A 
complete description of each instruction is provided in Appendix B. 

In the instruction formats shown in Table 6.9 through Table 6. 12, the 
Jmt appended to the instruction opcode specifies the data format: s 
specifies single-precision binary floating-point, d specifies double- 
precision binary floating-point, and w specifies binary fixed-point. 


OpCode 

Description 

LWC1 

Load Word to FPU 

SWC1 

Store Word from FPU 

LDC1 

Load Doubleword to FPU 

SDC1 

Store Doubleword From FPU 

MTC1 

Move Word To FPU 

MFC1 

Move Word From FPU 

CTC1 

Move Control Word To FPU 

CFC1 

Move Control Word From FPU 

DMTC1 

Doubleword Move To FPU 

DMFC1 

Doubleword Move From FPU 


Table 6.9 FPU Instruction Summary: Load, Move and Store Instructions 


Opcode 

Description 

CVT.S.fmt 

Floating-point Convert to Single FP 

CVT.D.fmt 

Floating-point Convert to Double FP 

CVT.W.fmt 

Floating-point Convert to Single Fixed Point 

ROUND, w.fmt 

Floating-point Round 

TRUNC.w.fmt 

Floating-point Truncate 

CEIL.w.fmt 

Floating-point Ceiling 

FLOOR.w.fmt 

Floating-point Floor 


Table 6.10 FPU Instruction Summary: Conversion Instructions 
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OpCode 

Description 

ADD.fmt 

Floating-point Add 

SUB.fmt 

Floating-point Subtract 

MUL.fmt 

Floating-point Multiply 

DIV.fmt 

Floating-point Divide 

ABS.fmt 

Floating-point Absolute Value 

MOV.fmt 

Floating-point Move 

NEG.fmt 

Floating-point Negate 

SQRT.fmt 

Floating-point Square Root 


Table 6.11 FPU Instruction Summary: Computational Instructions 


OpCode 

Description 

C.cond.fmt 

Floating-point Compare 

BC1T 

Branch on FPU True 

BC1F 

Branch on FPU False 

BC1TL 

Branch on FPU True Likely 

BC1FL 

Branch on FPU False Likely 


Table 6.12 FPU Instruction Summary: Compare and Branch Instructions 


Floating-Point Load, Store, and Move Instructions 

This section discusses the manner in which the FPU uses the load, store 
and move instructions listed in Table 6.9 on page 10; Appendix B provides 
a detailed description of each instruction. 

Transfers Between FPU and Memory 

All data movement between the FPU and memory is accomplished by 
using one of the following instructions: 

• Load Word To Coprocessor 1 (LWC1) or Store Word To Coprocessor 1 
(SWC 1) instructions, which reference a single 32-bit word of the FPU 
general registers 

• Load Doubleword (LDC1) or Store Doubleword (SDC1) instructions, 
which reference a 64-bit doubleword. 

These load and store operations are unformatted; no format conversions 
are performed and therefore no floating-point exceptions can occur due to 
these operations. 

With the LDC1 and SDC1 instructions the R4600/R4700 floating-point 
unit can take advantage of the 64-bit wide data cache and issue a 
coprocessor load or store double-word instruction with every cycle. 

Transfers Between FPU and CPU 

Data can also be moved directly between the FPU and the CPU by using 
one of the following instructions: 

• Move To Coprocessor 1 (MTC1) 

• Move From Coprocessor 1 (MFC1) 

• Doubleword Move To Coprocessor 1 (DMTC1) 

• Doubleword Move From Coprocessor 1 (DMFC1) 

Like the floating-point load and store operations, these operations 
perform no format conversions and never cause floating-point exceptions. 
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Load Delay and Hardware Interlocks 

The instruction immediately following a load can use the contents of the 
loaded register. In such cases the hardware interlocks, requiring 
additional real cycles; for this reason, scheduling load delay slots is 
desirable, although it is not required for functional code. 

Data Alignment 

All coprocessor loads and stores reference the following aligned data 
items: 

• For word loads and stores, the access type is always WORD, and the 
low-order 2 bits of the address must always be 0. 

• For doubleword loads and stores, the access type is always DOUBLE- 
WORD, and the low-order 3 bits of the address must always be 0. 

Endianness 

Regardless of byte-numbering order (endianness) of the data, the 
address specifies the byte that has the smallest byte address in the 
addressed field. For a big-endian system, it is the leftmost byte; for a little- 
endian system, it is the rightmost byte. 

Floating-Point Conversion Instructions 

Conversion instructions perform conversions between the various data 
formats such as single- or double-precision, fixed- or floating-point 
formats. Table 6.10 on page 10 lists conversion instructions; Appendix B 
gives a detailed description of each instruction. 

Floating-Point Computational Instructions 

Computational instructions perform arithmetic operations on floating- 
point values, in registers. Table 6.11 on page 1 1 lists the computational 
instructions and Appendix B provides a detailed description of each 
instruction. There are two categories of computational instructions: 

• 3-Operand Register-Type instructions, which perform floating-point 
addition, subtraction, multiplication, division, and square root. 

• 2-Operand Register-Type instructions, which perform floating-point 
absolute value, move, and negate. 

Branch on FPU Condition Instructions 

Table 6.12 on page 11 lists the Branch on FPU (coprocessor unit 1) 
condition instructions that can test the result of the FPU compare (C.cond) 
instructions. Appendix B gives a detailed description of each instruction. 

Floating-Point Compare Operations 

The floating-point compare (C.fmt.cond) instructions interpret the 
contents of two FPU registers (fs, ft) in the specified format (fmt) and 
arithmetically compare them. A result is determined based on the 
comparison and conditions (cond) specified in the instruction. 

Table 6.12 on page 1 1 lists the compare instructions; Appendix B gives 
a detailed description of each instruction. Table 6.13 on page 13 lists the 
mnemonics for the compare instruction conditions. 
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Mnemonic Definition 


Mnemonic Definition 


False T True 


UN 

Unordered 

OR 

Ordered 

EQ 

Equal 

NEQ 

Not Equal 

UEQ 

Unordered or Equal 

OLG 

Ordered or Less Than or Greater Than 

OLT 

Ordered Less Than 

UGE 

Unordered or Greater Than or Equal 

ULT 

Unordered or Less Than 

OGE 

Ordered Greater Than 

OLE 

Ordered Less Than or Equal 

UGT 

Unordered or Greater Than 


Unordered or Less Than or Equal 

OGT 

Ordered Greater Than 

SF 

Signaling False 

ST 

Signaling True 

NGLE 

Not Greater Than or Less Than or Equal 

GLE 

Greater Than, or Less Than or Equal 

SEQ 

Signaling Equal 

SNE 

Signaling Not Equal 

NGL 

Not Greater Than or Less Than 

GL 

Greater Than or Less Than 

LT 

Less Than 

NLT 

Not Less Than 

NGE 

Not Greater Than or Equal 

GE 

Greater Than or Equal 

LE 

Less Than or Equal 

NLE 

Not Less Than or Equal 

NGT 

Not Greater Than 

GT 

Greater Than 


Table 6.13 Mnemonics and Definitions of Compare Instruction Conditions 


FPU Instruction Pipeline Overview 

The FPU provides an instruction pipeline that parallels the CPU 
instruction pipeline. It shares the same five-stage pipeline architecture 
with the CPU (see Chapter 3). 

Instruction Execution 

Figure 6.9 illustrates the 5-stage FPU pipeline. This is the same as that 
of the integer pipeline but allows for the longer execution times of the 
floating-point instructions. 
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Figure 6.9 on page 6-13 assumes that one Instruction is completed 
every PCycle. Most FPU instructions, however, require more than one cycle 
in the EX stage. This means the FPU must stall the pipeline if an 
instruction execution cannot proceed because of register or resource 
conflicts. 

Floating-point operations proceed in parallel with non-floating-point 
operations. Floating-point operations are not allowed to overlap each 
other, with two exceptions: 

• An add operation may start 2 cycles after the start of a multiply and 
thus will be completely overlapped by the multiply. 

• A multiply operation may overlap for up to 2 cycles, as follows: 

R4600: A new multiply may start 6 cycles after another multiply. 

R4700: A new multiply may start 4 cycles after another multiply 
(for both single and double precision). 

Non-floating-point operations as well as other integer operations may be 
executed in parallel with the floating-point operations. All of this is 
handled automatically by internal hardware in the R4600/R4700. 

Instruction Execution Cycle Time 

Unlike the CPU, which executes almost all instructions in a single cycle, 
more time may be required to execute FPU instructions. 

Table 6.14 gives the minimum latency of each floating-point operation. 


Operation 

Pipeline Cycles 

Operation 

Pipeline Cycles 

Single 

Double 

Single 

Double 

ADD.fmt 

4 

4 

BC1T 

1 

SUB.fmt 

4 

4 

BC1F 

1 

MUL.fmt 



BC1TL 

1 

R4600 

8 

8 




R4700 

4 

5 




DIV.fmt 

32 

61 

BC1FL 

1 

SQRT.fmt 

31 

60 

LWC1, LDC1 

2 

ABS.fmt 

1 

1 

SWC1, SDC1 

1 

MOV.fmt 

1 

1 

TRUNC. W.fmt 

4 

4 

NEG.fmt 

1 

1 

MTC1, DMTC1 

2 

ROUND. W.fmt 

4 

4 

MFC1, DMFC1 

2 

CElL.W.fmt 

4 

4 

CTC1 

3 

FLOOR. W.fmt 

4 

4 

CFC1 

2 

CVT.S.fmt 

(a) 

4 

CMP 

3 

3 

CVT.D.fmt 

2 

(a) 

FIX 

4 

4 

CVT. W.fmt 

4 

4 

FLOAT 

6 

6 

C.fmt.cond 

3 

3 





Note: ^ These operations are illegal. 

Table 6.14 Floating-Point Operation Latencies 
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Instruction Scheduling Constraints 

The FPU resource scheduler only issues instructions to the FPU op units 
(adder and multiplier) when no hardware use conflicts will occur. In 
addition, some overlap possibilities are disallowed to keep the scheduler 
simple (and/or increase performance). 

FPU Multiplier Constraints 

The FPU multiplier is partially pipelined in the R4600, allowing a new 
multiply to begin every 6 cycles. It is more fully pipelined in the R4700, 
allowing a new multiply to begin every 4 cycles. 

FPU Adder Constraints 

The FPU scheduler may issue an add operation (ADD.fmt or SUB.fmt) 2 
cycles after a multiply (MUL.fmt). 

Resource Scheduling Rules 

The FPU Resource Scheduler issues instructions while adhering to the 
rules described below. These scheduling rules optimize op unit executions; 
if the rules are not followed, the hardware interlocks to guarantee correct 
operation. 

DIV.[S,D] can start only when all of the following conditions are met in 
the 1A phase. 

• The adder is idle (division is performed in the adder). 

• The multiplier is idle. 

MUL.[S,D] can start only when all of the following conditions are met in 
the 1A phase. 

• The multiplier is one of the following: 

- idle. 

- Started execution at least 6 cycles earlier on the current multiply 

• The adder is idle. 

S9RT.[S,D] can start when the following conditions are met in the 1A 
phase. 

• The adder is idle. 

• The multiplier must be idle. 

CVT.fmt instructions can only start when all of the following conditions 
are met in the 1A phase. 

• The adder is idle. 

• The multiplier is idle. 

ADD.[S,D] or SUB.[S,D] can start only when all of the following 
conditions are met in the 1A phase. 

• The adder is idle 

• The multiplier is either: 

- idle. 

- started execution of the current multiply at least 2 cycles earlier. 

NEG.[S,D] or ABS.[S,D] can start only when all of the following 
conditions are met in the 1A phase. 

• The adder is idle. 

• The multiplier is idle. 

C.COND.[S,D] can start only when all of the following conditions are met 
in the 1A phase. 

• The adder is idle. 

• The multiplier is idle. 


6-15 





Floating-Point Unit 


Chapter 6 



6-16 




Floating-Point Exceptions Chapter 7 



This chapter describes FPU floating-point exceptions, including FPU 
exception types, exception trap processing, exception flags, saving and 
restoring state when handling an exception, and trap handlers for IEEE 
Standard 754 exceptions. 

A floating-point exception occurs whenever the FPU cannot handle 
either the operands or the results of a floating-point operation in its normal 
way. The FPU responds by generating an exception to initiate a software 
trap or by setting a status flag. 

Exception Types 

Hie FP Control/ Status register described in Chapter 6 contains an 
Enable bit for each exception type; exception Enable bits determine 
whether an exception will cause the FPU to initiate a trap or set a status 
flag. 

• If a trap is taken, the FPU remains in the state found at the beginning 
of the operation and a software exception handling routine executes. 

• If no trap is taken, an appropriate value is written into the FPU des- 
tination register and execution continues. 

The FPU supports the five IEEE Standard 754 exceptions: 

• Inexact (I) 

• Underflow (U) 

• Overflow (O) 

• Division by Zero (Z) 

• Invalid Operation (V) 

Cause bits, Enables, and Flag bits (status flags) are used. 

The FPU adds a sixth exception type. Unimplemented Operation (E). 
This exception indicates the use of a software implementation. The 
Unimplemented Operation exception has no Enable or Flag bit; whenever 
this exception occurs, an unimplemented exception trap is taken. 

Figure 7. 1 illustrates the Control/ Status register bits that support 
exceptions. 
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Figure 7.1 Control/Status Register Exception/Flag/Trap /Enable Bits 
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Each of the five IEEE Standard 754 exceptions (V, Z, O, U, I) is 
associated with a trap under user control, and is enabled by setting one of 
the five Enable bits. When an exception occurs and its corresponding 
Enable bit is not set, both the corresponding Cause and Flag bits are set. 
When an exception occurs and its corresponding Enable bit is set, the 
corresponding Cause bit is set and the subsequent exception processing 
allows a trap to be taken. 

Exception Trap Processing 

When a floating-point exception trap is taken, the Cause register 
indicates the floating-point coprocessor is the cause of the exception trap. 
The Floating-Point Exception (FPE) code is used, and the Cause bits of the 
floating-point Control/ Status register indicate the reason for the floating- 
point exception. These bits are, in effect, an extension of the system 
coprocessor Cause register. 

Flags 

A Flag bit is provided for each IEEE exception. This Flag bit is set to a 
1 on the assertion of its corresponding exception, with no corresponding 
exception trap signaled. 

The Flag bit is reset by writing a new value into the Status register; flags 
can be saved and restored by software either individually or as a group. 

When no exception trap is signaled, the floating-point coprocessor takes 
a default action, providing a substitute value for the exception-causing 
result of the floating-point operation. The particular default action taken 
depends upon the type of exception. Table 7. 1 lists the default action 
taken by the FPU for each of the IEEE exceptions. 


Field 

Description 

Rounding 

Mode 

Default action 

I 

Inexact exception 

Any 

Supply a rounded result 

U 

Underflow exception 

Any 

Take unimplemented unless FCSR.FS bit is set. 

o 

Overflow exception 

RN 

Modify overflow values to °° with the sign of the 
intermediate result 



RZ 

Modify overflow values to the format’s largest finite 
number with the sign of the intermediate result 



RP 

Modify negative overflows to the format’s most nega- 
tive finite number; modify positive overflows to + **> 



RM 

Modify positive overflows to the format’s largest 
finite number; modify negative overflows to - <» 

z 

Division by zero 

Any 

Supply a properly signed °o 

V 

Invalid operation 

Any 

Supply a quiet Not a Number (NaN) 


Table 7.1 Default FPU Exception Actions 


The FPU detects the eight exception causes internally. When the FPU 
encounters one of these unusual situations, it causes either an IEEE 
exception or an Unimplemented Operation exception (E). 
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lists the exception-causing conditions of the IEEE Standard 754. 


FPA Internal 
Result 

IEEE 

Standard 754 

Trap 

Enable 

Trap 

Disable 

Notes 

Inexact result 

I 

I 

I 

Loss of accuracy 

Exponent overflow 

o,i a 

0,1 

0,1 

Normalized exponent > E max 

Division by zero 

Z 

Z 

Z 

Zero is (exponent = E^^-l, mantissa = 0) 

Overflow on convert 

V 

E 

E 

Source out of integer range 

Signaling NaN source 

V 

V 

V 

Signaling NaN source produces quiet NaN result 

Invalid operation 

V 

V 

V 

0/0, etc. 

Exponent underflow 

u 

E 

E 

Normalized exponent < 

Denormalized source 

None 

E 

E 

Exponent = E-l and mantissa <> 0 

Note: a The IEEE Standard 754 specifies an inexact exception on overflow only if the overflow trap is disabled. 


Table 7.2 FPU Exception-Causing Conditions 


FPU Exceptions 

The following sections describe the conditions that cause the FPU to 
generate each of its exceptions, and details the FPU response to each 
exception-causing condition. 

Inexact Exception (I) 

The FPU generates the Inexact exception if the rounded result of an 
operation is not exact or if it overflows. The FPU usually examines the 
operands of floating-point operations before execution actually begins, to 
determine (based on the exponent values of the operands) if the operation 
can possibly cause an exception. If there is a possibility of an instruction 
causing an exception trap, the FPU uses a coprocessor stall to execute the 
instruction. 

It is impossible, however, for the FPU to predetermine if an instruction 
will produce an inexact result. If Inexact exception traps are enabled, the 
FPU uses the coprocessor stall mechanism to execute all floating-point 
operations that require more than two cycles. Since this mode of execution 
can impact performance, Inexact exception traps should be enabled only 
when necessary. 

Trap Enabled Results: If Inexact exception traps are enabled, the result 
register is not modified and the source registers are preserved. 

Trap Disabled Results: The rounded or overflowed result is delivered to 
the destination register if no other software trap occurs. 

Invalid Operation Exception (V) 

The Invalid Operation exception is signaled if one or both of the 
operands are invalid for an implemented operation. When the exception 
occurs without a trap, the MIPS ISA defines the result as a quiet Not a 
Number (NaN). The invalid operations are: 

• Addition or subtraction: magnitude subtraction of infinities, such as: 
(+ °o) + (— t») or (— <») — (- «») 

• Multiplication: 0 times with any signs 

• Division: 0/0, or °°/°°, with any signs 

• Comparison of predicates involving < or > without?, when the oper- 
ands are unordered 

• Any arithmetic operation on a signaling NaN. Amove (MOV) operation 
is not considered to be an arithmetic operation, but absolute value 
(ABS) and negate (NEG) are considered to be arithmetic operations 
and cause this exception if one or both operands is a signaling NaN. 

• Square root: Vx, where x is less than zero 


7-3 










































Floating-Point Exceptions 


Chapter 7 


Software can simulate the Invalid Operation exception for other 
operations that are invalid for the given source operands. Examples of 
these operations include IEEE Standard 754-specified functions 
implemented in software, such as Remainder: x REM y, where y is 0 or x 
is infinite: conversion of a floating-point number to a decimal format whose 
value causes an overflow, is infinity, or is NaN; and transcendental 
functions, such as In (-5) or cos- 1(3). Refer to Appendix B for examples or 
for routines to handle these cases. 

Trap Enabled Results: The original operand values are undisturbed. 

Trap Disabled Results: The FPU sets the Invalid Operation Exception 
flag and a quiet NaN is delivered to the destination register. 

Division-by-Zero Exception (Z) 

The Division-by-Zero exception is signaled on an implemented divide 
operation if the divisor is zero and the dividend is a finite nonzero number. 
Software can simulate this exception for other operations that produce a 
signed infinity, such as ln(0), sec(jt/2), csc(O), or 0 -1- 

Trap Enabled Results: The result register is not modified, and the 
source registers are preserved. 

Trap Disabled Results: The result, when no trap occurs, is a correctly 
signed infinity. 

Overflow Exception (O) 

The Overflow exception is signaled when the magnitude of the rounded 
floating-point result, with an unbounded exponent range, is larger than 
the largest finite number of the destination format. (This exception also 
sets the Inexact exception and Flag bits.) 

Trap Enabled Results: The result register is not modified, and the 
source registers are preserved. 

Trap Disabled Results: The result, when no trap occurs, is determined 
by the rounding mode and the sign of the intermediate result. 

Underflow Exception (U) 

Two related events contribute to the Underflow exception: 

• creation of a tiny nonzero result between ±2 Emm which can cause 
some later exception because it is so tiny 

• extraordinary loss of accuracy during the approximation of such tiny 
numbers by denormalized numbers. 

IEEE Standard 754 allows a variety of ways to detect these events, but 
requires they be detected the same way for all operations. 

Tinniness can be detected by one of the following methods: 

• after rounding (when a nonzero result, computed as though the expo- 
nent range were unbounded, would lie strictly between ±2 Etnui ) 

• before rounding (when a nonzero result, computed as though the ex- 
ponent range and the precision were unbounded, would lie strictly be- 
tween ±2 Emin ). 

The MIPS architecture requires that tininess be detected after rounding. 

Loss of accuracy can be detected by one of the following methods: 

• denormalization loss (when the delivered result differs from what 
would have been computed if the exponent range were unbounded) 

• inexact result (when the delivered result differs from what would have 
been computed if the exponent range and precision were both un- 
bounded). 

The MIPS architecture requires that loss of accuracy be detected as an 
inexact result. 

Trap Enabled Results: When an underflow trap is enabled, underflow 
is signaled when tininess is detected regardless of loss of accuracy. If 
underflow traps are enabled, the result register is not modified, and the 
source registers are preserved. 
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Trap Disabled Results: When an underflow trap is not enabled and 
FCSR.FS is clear, then take an unimplemented exception. When an 
underflow trap is not enabled and FCSR.FS is set, raise Inexact and return 
either 0 or ±2 £min , as appropriate for the current rounding mode. 

Unimplemented Instruction Exception (E) 

Any attempt to execute an instruction with an operation code or format 
code that has been reserved for future definition sets the Unimplemented 
bit in the Cause field in the FPU Control/ Status register and traps. The 
operand and destination registers remain undisturbed and the instruction 
is emulated in software. Any of the IEEE Standard 754 exceptions can 
arise from the emulated operation, and these exceptions in turn are 
simulated. 

The Unimplemented Instruction exception can also be signaled when 
unusual operands or result conditions are detected that the implemented 
hardware cannot handle properly. These include: 

• Denormalized operand 

• Quiet NaN operand 

• Underflow 

• Reserved opcodes 

• Unimplemented formats 

• Conversion of a floating-point number to a fixed point format when an 
overflow occurs or the source operand value is Infinity or a NaN. 

• Operations which are invalid for their format (for instance, CVT.S.S) 

Denormalized and NaN operands are only trapped if the instruction is a 

convert or computational operation. Moves and compares do not trap if 
their operands are either denormalized or NaNs. 

The use of this exception for such conditions is optional; most of these 
conditions are newly developed and are not expected to be widely used in 
early implementations. Loopholes are provided in the architecture so that 
these conditions can be implemented with assistance provided by 
software, maintaining full compatibility with the IEEE Standard 754. 

Trap Enabled Results: The original operand values are undisturbed. 

Trap Disabled Results: This trap cannot be disabled. 

Saving and Restoring State 

Sixteen or thirty-two doubleword coprocessor load or store operations 
save or restore the coprocessor floating-point register state in memory. 
The remainder of control and status information can be saved or restored 
through Move To/From Coprocessor Control Register instructions, and 
saving and restoring the processor registers. Normally, the Control/ Status 
register is saved first and restored last. 

When the coprocessor Control/ Status register ( FCR31 ) is read, and the 
coprocessor is executing one or more floating-point instructions, the 
instruction(s) in progress are either completed or reported as exceptions. 
The architecture requires that no more than one of these pending 
instructions can cause an exception. Information indicating the type of 
exception is placed in the Control/ Status register. When state is restored, 
state information in the status word indicates that exceptions are pending. 

Writing a zero value to the Cause field of Control/ Status register clears 
all pending exceptions, permitting normal processing to restart after the 
floating-point register state is restored. 

The Cause field of the Control/ Status register holds the results of only 
one instruction; the FPU examines source operands before an operation is 
initiated to determine if this instruction can possibly cause an exception. 
If an exception is possible, the FPU executes the instruction in stall mode 
to ensure that no more than one instruction (that might cause an 
exception) is executed at a time. 
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Trap Handlers for IEEE Standard 754 Exceptions 

The IEEE Standard 754 strongly recommends that users be allowed to 
specify a trap handler for any of the five standard exceptions that can 
compute; the trap handler can either compute or specify a substitute 
result to be placed in the destination register of the operation. 

By retrieving an instruction using the processor Exception Program 
Counter (EPQ register, the trap handler determines: 

• exceptions occurring during the operation 

• the operation being performed 

• the destination format 

On Overflow or Underflow exceptions (except for conversions), and on 
Inexact exceptions, the trap handler gains access to the correctly rounded 
result by examining source registers and simulating the operation in 
software. 

On Overflow or Underflow exceptions encountered on floating-point 
conversions, and on Invalid Operation and Divide-by-Zero exceptions, the 
trap handler gains access to the operand values by examining the source 
registers of the instruction. 

The IEEE Standard 754 recommends that, if enabled, the overflow and 
underflow traps take precedence over a separate inexact trap. This 
prioritization is accomplished in software; hardware sets the bits for both 
the Inexact exception and the Overflow or Underflow exception. 
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Processor Signal Chapter 8 

Descriptions 


Introduction 

This chapter describes the signals used by and in conjunction with the 
R4600/R4700 processor. The signals include the System interface, the 
Clock/Control interface, the Interrupt interface, the Joint Test Action 
Group (JTAG) interface, and the Initialization interface. 

Signals are listed in bold, and low active signals have a trailing asterisk 
— for instance, the low-active Read Ready signal is RdRdy*. The signal 
description also tells if the signal is an input (the processor receives it) or 
output (the processor sends it out). 

Figure 8. 1 illustrates the functional groupings of the processor signals. 
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System Interface Signals 

System interface signals provide the connection between the R4600/ 
R4700 processor and the other components in the system. Table 8. 1 lists 
the system interface signals. 


Name 

Definition 

Direction 

Description 

ExtRqst* 

External request 

Input 

An external agent asserts ExtRqst* to 
request use of the System interface. The pro- 
cessor grants the request by asserting 

Release*. 

Release* 

Release interface 

Output 

In response to the assertion of ExtRqst* or a 
CPU read request, the processor asserts 
Release*, signalling to the requesting device 
that the System interface is available. 

RdRdy* 

Read ready 

Input 

The external agent asserts RdRdy* to indi- 
cate that it can accept a processor read 
request. 


System address/ 

data bus 

Input/ 

Output 

A 64 -bit address and data bus for communi- 
cation between the processor and an external 
agent. 

SysADC(7:0) 

System address/ 
data check bus 

Input/ 

Output 

An 8-bit bus containing check bits for the 
SysAD bus. 

SysCmd (8:0) 

System com- 
mand/data identi- 
fier 

Input/ 

Output 

A 9-bit bus for command and data identifier 
transmission between the processor and an 
external agent. 

SysCmdP 

System com- 
mand/data identi- 
fier bus parity 

Input/ 

Output 

A single, even-parity bit for the SysCmd bus. 

Validln* 

Valid input 

Input 

; 

The external agent asserts Validln* when it 
is driving a valid address or data on the 

SysAD bus and a valid command or data 
identifier on the SysCmd bus. 

ValldOut* 

Valid output 

Output 

The processor asserts ValidOut* when it is 
driving a valid address or data on the SysAD 
bus and a valid command or data identifier 
on the SysCmd bus. 

WrRdy* 

Write ready 

Input 

An external agent asserts WrRdy* when it 
can accept a processor write request. 


Table 8.1 System Interface Signals 
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Clock/Control Interface Signals 

The Clock/Control interface signals make up the interface for clocking 
and maintenance. 

Table 8.2 lists the Clock/ Control interface signals. 


Name 

Definition 

Direction 

Description 


I/O output 

Output 

Reserved for future output. 
Always High. 

IOIn 

I/O input 

Input 

Reserved for future input 

Should be driven High. 

MasterClock 

Master clock 

Input 

Master clock input that estab- 
lishes the processor operating 
frequency. It is 1/2 the pipeline 
frequency. 

MasterOut 

Master clock out 

Output 

Master clock output aligned with 

MasterClock. 

RClock(l:0) 

Receive clocks 

Output 

Two identical receive clocks that 
establish the System interface 
frequency. 

SyncOut 

Synchronization 
clock out 

Output 

SyncOut must be connected to 
Syncln through an interconnect 
that models the interconnect 
between MasterOut, TClock, 
RClock, and the external agent. 

Syncln 

Synchronization 
clock in 

Input 

Synchronization clock input. 

TClock(l:0) 

Transmit clocks 

Output 

Two identical transmit clocks 
that establish the System inter- 
face frequency. 

Fault* 

Fault 

Output 

Reserved for future output. 
Always High. 

VccP 

Quiet Vcc for PLL 

Input 

Quiet Vcc for the internal phase 
locked loop. 

VssP 

Quiet Vss for PLL 

Input 

Quiet Vss for the internal phase 
locked loop. 


Table 8.2 Clock/Control Interface Signals 
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Interrupt Interface Signals 

The Interrupt interface signals make up the interface used by external 
agents to interrupt the R4600/R4700 processor. Six hardware interrupts 
(Int*(5:0)) and one NMI are available on the R4600/R4700. Table 8.3 lists 
the Interrupt interface signals. 


Name 

Definition 

Direction 

Description 

Int*(5:0) 

Interrupt 

Input 

Six general processor interrupts, bit-wise ORed 
with bits 5:0 of the interrupt register. 

NMI* 

Nonmaskable 

Interrupt 

Input 

Nonmaskable interrupt, ORed with bit 6 of the 
interrupt register. 


Table 8.3 Interrupt Interface Signals 


JTAG Interface Signals 

The R4600/R4700 does not implement JTAG. The signals are provided 
for compatibility with existing R4x00PC designs. 

Table 8.4 lists the JTAG interface signals. 


Name 

Definition 

Direction 

Description 

JTDI 

JTAG data in 

Input 

Connected directly to JTDO. No JTAG imple- 
mented. Should be pulled High. 

JTCK 

TAG clock input 

Input 

Unused input. Should be pulled High. 

JTDO 

JTAG data out 

Output 

Connected directly to JTDI. If no external 
scan used, this is a no connect. 

JTMS 

JTAG command 

Input 

Unused input. Should be pulled High. 


Table 8.4 JTAG Interface Signals 
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Initialization Interface Signals 

The Initialization interface signals make up the interface by which an 
external agent initializes the processor operating parameters. Table 8.5 
lists the Initialization interface signals. 


Name 

Definition 

Direction 

Description 

ColdReset* 

Cold reset 

Input 

This signal must be asserted for a 
power on reset or a cold reset The 
clocks SClock, TClock, and 

RClock begin to cycle and are syn- 
chronized with the deasserted edge 
of ColdReset*. ColdReset* must 
be deasserted synchronously with 
MasterClock. 

ModeClock 

Boot mode clock 

Output 

Serial boot-mode data clock output; 
runs at the Master Clock frequency 
divided by 256: (MasterClock/ 

256). 

Modeln 

Boot mode data In 

Input 

Serial boot-mode data input. 

Reset* 

Reset 

Input 

This signal must be asserted for any 
reset sequence. It can be asserted 
synchronously or asynchronously 
for a cold reset, or synchronously to 
initiate a warm reset. Reset* must 
be deasserted synchronously with 
MasterClock. 

VCCOk 

Vcc is OK 

Input 

When asserted, this signal indicates 
to the processor that V C c > V cc min 
for more than 100 milliseconds and 
will remain stable. The assertion of 
VCCOk initiates the initialization 
sequence. 


Table 8.5 Initialization Interface Signals 
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Table 8.6 lists the R4600/R4700 processor signals and their possible 
states. 



Description 

Name 

I/O 

Asserted 

State 

3-State 

Reset 

State 

System address /data bus 



High 

Yes 

a 

System address /data check bus 



High 

Yes 

a 

System command/data identifier bus 

SysCmd(8:0) 

I/O 

High 

Yes 

a 

System command/data identifier bus parity 

SysCmdP 

I/O 

High 

Yes 

a 

Valid input 

Validln* 

I 


No 

NA 

Valid output 

ValidOut* 

o 

Low 

Yes 

b 

External request 

ExtRqst* 

I 

Low 

No 

NA 

Release interface 

Release* 

o 

Low 

Yes 

b 

Read ready 


I 

Low 

No 

NA 

Write ready 



Low 

No 

NA 

Interrupts 

Int*(5:0) 

I 

Low 

No 

NA 

Nonmaskable interrupt 


I 

Low 

No 

NA 

Boot mode data in 


I 

High 

No 


Boot mode clock 

ModeClock 



No 

d 


JTAG data in 

JTD1 


High 

No 

NA 

JTAG data out 

JTDO 

O 

High 

Yes 

b 



JTAG command 

JTMS 

I 

High 

No 

NA 

JTAG clock input 

JTCK 

I 


No 

NA 

Transmit clocks 

TClock(l:0) 

o 

High 

Yes 

c 

Receive clocks 

RClock(l:0) 

o 


Yes 

c 

Master clock 

MasterClock 

I 

| 

No 

NA 

Master clock out 

MasterOut 

o 


Yes 

c 

Synchronization clock out 

SyncOut 

o 


Yes 

c 

Synchronization clock in 

Syncln 

I 

High 

No 

NA 

I/O output 

IOOut 

o 


Yes 

b 

I/O input 

lOIn 

I 

High 

No 


Vcc is OK 

VCCOk 

I 

High 

No 

NA 

Cold reset 

ColdReset* 

I 

Low 

No 

NA 

Reset 


I 

Low 

No 

NA 

Fault 

Fault* 

o 

Low 

Yes 

b 

Key to Reset State Column: 

a All I/O pins (SysAD[63:0], SysADC[7:0J, etc.) remain 3-stated until the Reset* signal deasserts. 

b All output only pins (ValidOut*, Release*, etc.), except the clocks, are 3-stated until the ColdReset* 

signal deasserts. 

c All clocks, except ModeClock, are 3-stated until VCCOk asserts, 

d ModeClock is always driven. 

NA Not applicable to input pins. 


Table 8.6 R4600/R4700 Processor Signal Summary 
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Integrated Device Technology, Inc. 


Introduction 

This chapter describes the R4600/R4700 Initialization interface. This 
includes the reset signal description and types, initialization sequence, 
with signals and timing dependencies, and boot modes, which are set at 
initialization time. 

Signal names are listed in bold letters — for instance the signal VCCOk 
indicates the Vcc voltage is stable. Low-active signals are indicated by an 
asterisk at the end of the name, as in ColdReset*. 

Functional Overview 

The R4600/R4700 processor has the following three types of resets. 
Refer to Figure 9. 1 on page 9-4, Figure 9.2 on page 9-5, and Figure 9.3 on 
page 9-6 for timing diagrams of these resets. 

• Power-on reset: Starts when the power supply is turned on and 

completely reinitializes the internal state machine of 
the processor without saving any state information. 

• Cold reset: Restarts all clocks, but the power supply remains 

stable. A cold reset completely reinitializes the 
internal state machine of the processor without 
saving any state information. 

• Warm reset: Restarts processor, but does not affect clocks. A 

warm reset preserves the processor internal state. 

These resets use the VCCOk, ColdReset*, and Reset* input signals, 
which are summarized in the next subsection. Descriptions of each type 
of reset operation is described 

The Initialization interface is a serial interface that operates at the 
frequency of the MasterClock divided by 256 (i.e. MasterClock/256). This 
low-frequency operation allows the initialization information to be stored 
in a low-cost EPROM or PLD. 

Reset and Initialization Signal Descriptions 

This section describes the three reset signals, VCCOk, ColdReset*, and 
Reset*, and the two initialization signals, Modeln and ModeClock. 

VCCOk: When asserted 1 , VCCOk indicates to the processor that the 5.0 
(3.3) volt power supply (Vcc) has been above 4.75 (3.0) volts for 
more than 100 m il liseconds (ms) and is expected to remain 
stable. The assertion of VCCOk initiates the reading of the 
boot-time mode control serial stream. This is described in the 
subsection “Initialization Sequence” on page 9-4. 

ColdReset*: The ColdReset* signal must be asserted (low) for either a 
power-on reset or a cold reset. The clocks SClock, TClock, and 
RClock begin to cycle and are synchronized with the 
de-asserted edge (high) of ColdReset*. ColdReset* must be 
de-asserted synchronously with MasterClock. 

Reset*: The Reset* signal must be asserted for any reset sequence. It 
can be asserted synchronously or asynchronously for a cold 
reset, or synchronously to initiate a warm reset Reset* must 
be de-asserted synchronously with MasterClock. 

Modeln: Serial boot mode data in. 

ModeClock: Serial boot mode data out, at the MasterClock frequency 
divided by 256 (MasterClock/256). 


1- Asserted means the signal is true, or in its valid state. For example, the low- 
active Reset* signal is said to be asserted when it is in a low (true) state; the 
high-active VCCOk signal is true when it is asserted high. 
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Table 9. 1 lists the processor signals and their possible states. 


Description 

— 

Name 

I/O 

Asserted State 



System address/ data bus 

SysAD(63:0) 

I/O 

High 

Yes 

a 

System address/data check bus 

SysADC(7:0) 

I/O 

High 

Yes 

a 

System command/data identifier bus 

SysCmd(8:0) 

I/O 

High 

Yes 

a 

System command/data identifier bus parity 

SysCmdP 

I/O 

High 


a 

Valid input 

Validln* 

I 

Low 

No 

NA 

Valid output 

ValidOut* 

o 

Low 


b 

External request 

ExtRqst* 

I 

Low 


NA 

Release interface 

Release* 

O 

Low 

Yes 

b 

Read ready 


■a 

Low 

No 

NA 

Write ready 


m 

Low 

No 

NA 

Interrupts 

Int*(5:0) 

I 

Low 

No 

NA 

Nonmaskable interrupt 

NMI* 

i 

Low 

No 

NA 

Boot mode data in 

Modeln 

I 

High 

No 

NA 

Boot mode clock 

ModeClock 

O 

High 

No 

d 

JTAG data in 

JTDI 

I 

High 

No 

NA 

JTAG data out 

JTDO 

o 

High 

Yes 

b 

JTAG command 

JTMS 

I 

High 

No 

NA 

JTAG clock input 

JTCK 

I 

High 

No 

NA 

Transmit clocks 

TClock(l:0) 

o 

High 

Yes 

c 

Receive clocks 



High 

Yes 

c 

Master clock 



High 

No 

NA 

Master clock out 

MasterOut 

o 

High 

Yes 

c 

Synchronization clock out 

SyncOut 

o 

High 

Yes 

c 

Synchronization clock in 

Syncln 

I 

High 

No 

NA 

I/O output 

IOOut 

o 

High 

Yes 

b 

I/O input 

IOIn 

I 

High 

No 

NA 

Vcc is OK 

VCCOk 

I 

High 


NA 

Cold reset 



Low 


NA 

Reset 

Reset* 

I 

Low 

No 

NA 

Fault 



Low 


b 


Key to Reset State Column: 

a All I/O pins (SysAD[63:0J, SysADC[7:0|, etc.) remain 3-stated until the Reset* signal deasserts. 
b All output only pins (Valid Out*, Release*, etc.), except the clocks, are 3-stated until the ColdReset* signal 
deasserts. 

c All clocks, except ModeClock, are 3-stated until VCCOk asserts, 
d ModeClock is always driven. 

NA Not applicable to input pins. 


Table 9.1 R4600/R47 00 Processor Signal Summary 
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Power-on Reset 

Figure 9.1, Figure 9.2, and Figure 9.3 illustrate the power-on, warm, 
and cold resets. 

This is the sequence for a power-on reset: 

1. Power-on reset applies a stable Vcc of at least 4.5 (3.0) volts from the 
5.0 (3.3) volt power supply to the processor. During this time, VCCOk is 
deasserted, ColdReset* and Reset* are asserted and the MasterClock 
input oscillates. 

2. After at least 100 ms of stable Vcc and MasterClock, the VCCOk 
signal is asserted to the processor. The assertion of VCCOk begins the 
initialization of the processor. After the mode bits have been read in, the 
processor allows its internal phase locked loops to lock, stabilizing the 
processor internal clock, PClock, the SyncOut-SyncIn clock path 
(described in Chapter 10), and the master clock output, MasterOut. 

3. ColdReset* is asserted for at least 64K (or 2 16 ) MasterClock cycles 
after the assertion of VCCOk. Once the processor reads the boot-time 
mode control serial data stream, ColdReset* can be deasserted. 
ColdReset* must be deasserted synchronously with MasterClock. 

4. The deasserted edge of ColdReset* synchronizes the edges of SClock, 
TClock, and RClock (to all processors, if in a multiprocessor system). 

5. After ColdReset* is deasserted synchronously and SClock, TClock, 
and RClock have stabilized, Reset* is deasserted to allow the processor to 
begin running. (Reset* must be held asserted for at least 64 MasterClock 
cycles after the deassertion of ColdReset*.) Reset* must be deasserted 
synchronously with MasterClock. 

Note: ColdReset* must be asserted when VCCOk asserts. The behavior of the 
processor is undefined if VCCOk asserts while ColdReset* is deasserted. 

Cold Reset 

A cold reset can begin anytime after the processor has read the 
initialization data stream, causing the processor to start with the Reset 
exception. 

A cold reset requires the same sequence as a power-on reset except that 
the power is presumed to be stable before the assertion of the reset inputs 
and the deassertion of VCCOk. 

To begin the reset sequence, VCCOk must be deasserted for a minimum 
of 100 ms before reassertion. 

Warm Reset 

To execute a warm reset, the Reset* input is asserted synchronously 
with MasterClock. It is then held asserted for at least 64 MasterClock 
cycles before being deasserted synchronously with MasterClock. The 
processor internal clocks, PClock and SClock, and the System interface 
clocks, TClock and RClock, are not affected by a warm reset. The boot- 
time mode control serial data stream is not read by the processor on a 
warm reset. A warm reset forces the processor to start with a Soft Reset 
exception. 

The master clock output, MasterOut, generates any reset-related 
signals for the processor that must be synchronous with MasterClock. 

After a power-on reset, cold reset, or warm reset, all processor internal 
state machines are reset, and the processor begins execution at the reset 
vector. All processor internal states are preserved during a warm reset, 
although the precise state of the caches depends on whether or not a cache 
miss sequence has been interrupted by resetting the processor state 
machines. 
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Initialization Sequence 

The boot-mode Initialization sequence begins immediately after VCCOk 
is asserted. As the processor reads the serial stream of 256 bits through 
the Modeln pin, the boot-mode bits initialize all fundamental processor 
modes. (The signals used are described in Chapter 8). 

This is the initialization sequence: 

1. The system deasserts the VCCOk signal. The ModeClock output 
is held asserted. 

2. The processor synchronizes the ModeClock output at the time 
VCCOk is asserted. The first rising edge of ModeClock occurs at least 256 
MasterClock cycles after VCCOk is asserted. There could be more clock 
cycles due to internal delays on the VccOK signal. After the first rising 
edge, each additional rising edge will be 256 master clock cycles. 

3. Each bit of the initialization stream is presented at the Modeln pin 
after each rising edge of the ModeClock. The processor samples 256 
initialization bits from the Modeln input. 
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Figure 9.1 Power-on Reset 
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Figure 9.2 Cold Reset 


9-5 







Initialization Interface Chapter 9 



Figure 9.3 Warm Reset 


Boot-Mode Settings 

Unlike the R4000, the speed of the R4600/R4700 output drivers is 
statically controlled at boot time. 

Table 9.2 lists the processor boot-mode settings. The following rules 
apply to the boot-mode settings listed in the table: 

• Bit 0 of the stream is presented to the processor when VCCOk 
is first asserted. 

• Selecting a reserved value results in undefined processor behav- 
ior. 

• Bits 19 to 255 are reserved bits. 

• Zeros must be scanned in for all reserved bits. 


9-6 








Serial 

Bit 


Value 


Mode Setting Serial 
Bit 


Value 


Mode Setting 


i 


0 

Reserved (must be zero) 


9:10 

1:4 

XmitDatPat: System interface data rate for 
block writes only (bit 4 most significant) 



0 

DDDD 



1 

DDxDDx 



2 

DDxxDDxx 



3 

DxDxDxDx 

11 


4 

DDxxxDDxxx 



5 

DDxxxxDDxxxx 



6 

DxxDxxDxxDxx 

12 


7 

DDxxxxxxDDxxxxxx 

13:14 


8 

DxxxDxxxDxxxDxxx 



9-15 

Reserved 


5:7 

SysCkRatio: PClock to SClock divisor, fre- 
quency relationship between SClock, RClock, 
and TClock and PClock, bit 7 most significant. 



0 

Divide by 2 



1 

Divide by 3 

15 


2 

Divide by 4 



3 

Divide by 5 

16 


4 

Divide by 6 



5 

Divide by 7 

17 


6 

Divide by 8 



7 

Reserved 

18 

8 

EndBlt: Specifies byte ordering 



0 

Little-endian 

ordering 

19:255 


1 

Big-endian 

ordering 



i 

Non-block Write: Selects the manner in 
which non-block writes are handled, bit 10 
is most significant 

0 

R4x00 compatible 

1 

Reserved 

2 

Pipelined Writes 

3 

Write re-issue 

TmrlntEn: Disables the timer interrupt on 

Int*[5] 


0 

Enabled Timer Interrupt 

1 

Disabled Timer Interrupt 

Reserved (must be zero) 


Drv_Out: Output driver slew rate control. 
Bit 14 is most significant. Affects only out- 
puts that are not clocks. 


10 

100% strength (fastest) 

11 

83% strength 

00 

67% strength 

01 

50% strength (slowest) 

Tclock[0]: 


[0] Enabled. 

[1] Disabled. 

Tclock[l]: 

[0] Enabled. 

[1] Disabled. 

Rclock[0]: 

[0] Enabled. 

[1] Disabled. 

Rclock[l]: 

[01 Enabled. 

[1] Disabled. 

Reserved (must be zero) 


Table 9.2 Boot-Mode Settings 
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Introduction 

This chapter describes the clock signals (“clocks”) used in the R4600/ 
R4700 processor and the processor status reporting mechanism. 

The subject matter includes basic system clocks, system timing 
parameters, connecting clocks to a phase-locked system, connecting 
clocks to a system without phase locking, and processor status outputs. 

Signal Terminology 

The following terminology is used in this chapter (and book) when 
describing signals: 

• Rising edge indicates a low-to-high transition. 

• Falling edge indicates a high-to-low transition. 

• Clock-to-Q delay is the amount of time it takes for a signal to move 
from the input of a device (dock) to the output of the device (0). 

Figure 10. 1 and Figure 10.2 illustrate these terms. 



— 

data in 

Q- 

data out 

► 

r^LJ^Lj 

clock input 

r ~\ 

> 

Clock-to-Q 

delay 

◄ ► 



Figure 10.2 Clock-to-g Delay 


Basic System Clocks 

The various clock signals used in the R4600/R4700 processor are 
described below, starting with MasterClock, upon which the processor 
bases all internal and external clocking. Note: All output clocks will have 
approximately a 50% duty cycle ± the jitter and any difference in rise and/ 
or fall times. 

MasterClock 

The processor bases all internal and external clocking on the single 
MasterClock input signal. The processor generates the clock output 
signal, MasterOut, at the same frequency as MasterClock and aligns 
MasterOut with MasterClock, if Syncln is properly connected to 

SyncOut. 
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MasterOut 

The processor generates the clock output signal, MasterOut, at the 
same frequency as MasterClock and aligns MasterOut with MasterClock, 
if Syncln is properly connected to SyncOut. MasterOut clocks certain 
external logic, such as the reset logic. 

SyncIn/SyncOut 

The processor generates SyncOut at the same frequency as 
MasterClock and aligns Syncln with MasterClock. 

SyncOut must be connected to Syncln either directly, or through an 
external buffer. The processor can compensate for both output driver and 
input buffer delays (and, when necessary, delay caused by an external 
buffer according to the connections of TClock and RClock to the rest of 
the system) when aligning Syncln with MasterClock. Figure 10.8 on 
page 10-9 gives an illustration of SyncOut connected to Syncln through 
an external buffer. 

PClock 

The processor generates an internal clock, PClock, at twice the 
frequency of MasterClock and precisely aligns every other rising edge of 
PClock with the rising edge of MasterClock. 

All internal registers and latches use PClock, which is the pipeline clock 
rate. 

SClock 

The R4600/R4700 processor divides PClock by 2, 3, 4, 5, 6, 7 or 8, 
programmed at boot-mode initialization to generate the internal clock 
signal, SClock. The processor uses SClock to sample data at the system 
interface and to clock data into the processor system interface output 
registers. 

The first rising edge of SClock, after ColdReset* is deasserted, is 
aligned with the first rising edge of MasterClock. 

TClock 

TClock (transmit clock) clocks the output registers of an external agent, 
and can be a global system clock for any other logic in the external agent. 

TClock is identical to SClock. The edges of TClock align precisely with 
the edges of SClock and TClock can also be aligned with MasterClock, 
when Syncln is properly connected to SyncOut. 

RClock 

The external agent uses RClock (receive clock) to clock its input 
registers. The processor generates RClock at the same frequency as 
SClock, although RClock leads TClock and SClock by 25 percent of 
SClock cycle time. 
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Figure 10.3 shows the clocks for a PClock-to-SClock division by 2. 



System Timing Parameters 

As shown in Figure 10.3, data provided to the processor must be stable 
a minimum of t DS nanoseconds (ns) before the rising edge of SClock and 
be held valid for a minimum of t DH ns after the rising edge of SClock. 

Alignment to SClock 

Processor data becomes stable a minimum of t DM ns and a maximum of 
too ns after the rising edge of SClock. This drive-time is the sum of the 
maximum delay through the processor output drivers together with the 
maximum clock-to-Q delay of the processor output registers. 

Alignment to MasterClock 

Certain processor inputs (specifically VCCOk, ColdReset*, and Reset*) 
are sampled based on MasterClock, while others are output based on 
MasterClock. The same setup, hold, and drive-off parameters, t DS , t DH , 

toM- and too. shown in Figure 10.3, apply to these inputs and outputs, but 
they are measured relative to MasterClock instead of SClock. 

Phase-Locked Loop (PLL) 

The processor aligns SyncOut, PClock, SClock, TClock, and RClock 

with internal phase-locked loop (PLL) circuits that generate aligned clocks 
based on SyncOut/SyncIn. By their nature, PLL circuits are only capable 
of generating aligned clocks for MasterClock frequencies within a limited 
range. 
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Clocks generated using PLL circuits contain some inherent inaccuracy, 
or jitter, a clock aligned with MasterClock by the PLL can lead or trail 
MasterClock by as much as the related maximum jitter specified in the 
data sheet. 

PLL Components and Operation 

The passive components required for the Phase Locked Loop circuit are 
contained in the packages for the R4600 and R4700. There are no required 
external passive components. 

Passive Components 

The Phase Locked Loop circuit requires several passive components for 
proper operation, which are connected to PLLCapO, PLLCapl, VccP, and 
VssP, as illustrated in Figure 10.4. 


R4600/R4700 



Figure 10.4 PLL Passive Components 

It is essential to isolate the analog power and ground for the PLL circuit 
(VccP/VssP) from the regular power and ground (Vcc/Vss). Initial 
evaluations have yielded good results with the following values: 


R = 

5 

ohms 

Cl = 

1 

nF 

C2 = 

82 

nF 

C3 = 

10 

(IF 

Cp = 

470 pF 


Since the optimum values for the filter components depend upon the 
application and the system noise environment, these values should be 
considered as starting points for further experimentation within your 
specific application. 
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Figure 10.5 shows the internal PLL and clock distribution network of the 
R4600/R4700. 



Connecting Clocks to a Phase-Locked System 

When the processor is used in a phase-locked system, the external agent 
must phase lock its operation to a common MasterClock. In such a 
system, the delivery of data and data sampling have common 
characteristics, even if the components have different delay values. For 
example, transmission time (the amount of time a signal takes to move from 
one component to another along a trace on the board) between any two 
components A and B of a phase-locked system can be calculated from the 
following equation: 

Transmission Time = (SClock period) - (t DO for A) - (t DS for B) - 
(Clock Jitter for A Max) - (Clock Jitter for B Max) 
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Figure 10.6 shows a block-level diagram of a phase-locked system using 
the R4600/R4700 processor. 


MasterClock 


R4600/R4700 

MasterClock 

SysCmd 

SysAD 

SyncOut 

Syncln 

RCIock 

TCIock 


Figure 10.6 R4600/R4700Processor Phase-Locked System 

Connecting Clocks to a System without Phase Locking 

When the R4600/R4700 processor is used in a system in which the 
external agent cannot lock its phase to a common MasterClock, the 
output clocks RCIock and TCIock can clock the remainder of the system. 
Two clocking methodologies are described in this section: connecting to a 
gate-array device or connecting to discrete CMOS logic devices. 

Connecting to a Gate-Array Device 

When connecting to a gate-array device, both RCIock and TCIock are 
used within the gate-array. The gate array internally buffers RCIock and 
uses this buffered version to clock registers that sample processor 
outputs. 

These sampling registers should be immediately followed by staging 
registers clocked by an internally buffered version of TCIock. This buffered 
version of TCIock should be the global system clock for the logic inside the 
gate array and the clock for all registers that drive processor inputs. 
Figure 10.7 on page 7 is a block diagram of this circuit. 

Staging registers place a constraint on the sum of the clock-to-Q delay 
of the sample registers and the setup time of the synchronizing registers 
inside the gate arrays, as shown in the following equation: 

Clock-to-Q Delay + Setup of Synch Register < 0.25 (RCIock period) 

- (Max Clock Jitter for RCIock) 

- (Max Delay Mismatch for Clock Buffers on RCIock and TCIock) 


External Agent 

MasterClock 

SysCmd 

SysAD 
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Figure 10.7 is a block diagram of a system without phase lock, using the 
R4600/R4700 processor with an external agent implemented as a gate 
array. 



Figure 10.7 Gate-Array System Without Phase Lock, Using the 
R4600/R4700 Processor 


In a system without phase lock, the transmission time for a signal from 
the processor to an external agent composed of gate arrays can be 
calculated from the following equation: 

Transmission Time = (75 percent of TGock period) - (t DO for R4600/R4700) 

+ (Min External Clock Buffer Delay) 

- (External Sample Register Setup Time) 

- (Max Clock Jitter for R4600/R4700 Internal Clocks) 

- (Max Clock Jitter for RClock) 
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The transmission time for a signal from an external agent composed of 
gate arrays to the processor in a system without phase lock can be 
calculated from the following equation: 

Transmission Time = (TClock period) - (t DS for R4600/R4700) 

- (Max External Clock Buffer Delay) 

- (Max External Output Register Clock-to-Q Delay) 

- (Max Clock Jitter for TClock) 

- (Max Clock Jitter for R4600/R4700 Internal Clocks) 

Connecting to a CMOS Logic System 

The processor uses matched delay clock buffers to generate aligned 
clocks to external CMOS logic. A matched delay clock buffer is inserted in 
the SyncOut/SyncIn alignment path of the processor, skewing SyncOut, 
MasterOut, RClock, and TClock to lead MasterClock by the buffer delay 
amount, while leaving PClock aligned with MasterClock. 

The remaining matched delay clock buffers are available to generate a 
buffered version of TClock aligned with MasterClock. Alignment error of 
this buffered TClock is the sum of the maximum delay mismatch of the 
matched delay clock buffers, and the maximum clock jitter of TClock. 

As the global system clock for the discrete logic that forms the external 
agent, the buffered version of TClock clocks registers that sample 
processor outputs, as well as clocking the registers that drive the processor 
inputs. 

The transmission time for a signal from the processor to an external 
agent composed of discrete CMOS logic devices can be calculated from the 
following equation: 

Transmission Time = (TClock period) - (t DO for R4600/R4700) 

- (External Sample Register Setup Time) 

- (Max External Clock Buffer Delay Mismatch) 

- (Max Clock Jitter for R4600/R4700 Internal Clocks) 

- (Max Clock Jitter for TClock) 
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Figure 10.8 Is a block diagram of a system without phase lock, 
employing the R4600/R4700 processor and an external agent composed of 
both a gate array and discrete CMOS logic devices. 


MasterClock 


R4600/R4700 

MasterClock 

SysCmd 

SysAD 

SyncOut 

Syncln 

RCIock 

TCIock 


Sample 

Registers 








imMMB 





■ ■■ 


Memory 


Figure 10.8 Gate Array and CMOS System Without Phase Lock, Us- 
ing the R4600/R4700 Processor 

The transmission time for a signal from an external agent composed of 
discrete CMOS logic devices can be calculated from the following equation: 


Transmission Time = (TCIock period) - (t^g for R4600/R4700) 

- (Max External Output Register Clock-to-Q Delay) 

- (Max External Clock Buffer Delay Mismatch) 

- (Max Clock Jitter for R4600/R4700 Internal Clocks) 

- (Max Clock Jitter for TCIock) 


In this clocking methodology, the hold time of data driven from the 
processor to an external sampling register is a critical parameter. To 
guarantee hold time, the minimum output delay of the processor, t DM , 
must be greater than the sum of the following: 

Min hold time for the external sampling register 

+ max clock jitter for R4600/R4700 internal clocks 
+ max clock jitter for TCIock 

+ max delay mismatch of the external clock buffers 
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Introduction 

This chapter describes in detail the cache memory: its place in the 
R4600/R4700 memory organization and individual operations of the 
primary cache. 

This chapter uses the following terminology: 

• The primary cache may also be referred to as the P-cache. 

• The primary data cache may also be referred to as the D -cache. 

• The primary instruction cache may also be referred to as the I-cache. 

These terms are used interchangeably throughout this book. 

Memory Organization 

Figure 11.1 shows the R4600/R4700 system memory hierarchy. In the 
logical memory hierarchy, caches lie between the CPU and main memory. 
They are designed to make the speedup of memory accesses transparent 
to the user. Each functioned block in Figure 11.1 has the capacity to hold 
more data than the block above it. For instance, physiced main memory 
has a larger capacity them the primary cache. At the same time, each 
functional block takes longer to access them any block above it. For 
instance, it takes longer to access data in main memory than in the CPU 
on-chip registers. 



Figure 11,1 Logical Hierarchy of Memory 


The R4600/R4700 processor heis two on-chip primary caches: one holds 
instructions (the instruction cache), the other holds data (the data cache). 
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Overview of Cache Operations 

As described earlier, caches provide fast temporary data storage, and 
they make the speedup of memory accesses transparent to the user. In 
general, the processor accesses cache-resident instructions or data 
through the following procedure: 

1. The processor, through the on-chip cache controller, attempts to 
access the next instruction or data in the primary cache. 

2. The cache controller checks to see if this instruction or data is present 
in the primary cache. 

• If the instruction/data is present, the processor retrieves it. This is 
called a primary-cache hit. 

• If the instruction/data is not present in the primary cache, it is re- 
trieved as a cache line from memory and is written into the primary 
cache. 

3. The processor retrieves the instruction/data from the primary cache 
and operation continues. For a data cache miss, the processor can restart 
the pipeline after the first doubleword (the one at the miss address) is 
retrieved and continues the cache line refill in parallel. 

It is possible for the same data to be in two places simultaneously: main 
memory and the primary cache. This data is kept consistent through the 
use of either a write-back or a write-through methodology. For a write-back 
cache, the modified data is not written back to memory until the cache line 
is replaced. In a write-through cache, the data is written to memory as the 
cached data is modified (with a possible delay due to the write buffer). 

R4600/R4700 Cache Description 

This section describes the organization of on-chip primary caches. As 
Figure 1 1 . 1 on page 1 shows, the R4600/R4700 contains separate primary 
instruction and data caches. 

Figure 11.2 provides block diagrams of the R4600/R4700 memory 
model. 



Figure 11.2 Cache Support in the R4600/R4700 


Cache Line Size 

A cache line is the smallest unit of information that can be fetched from 
memory to be filled into the cache. A primary cache line is 8 words in 
length, and is represented by a single tag. 

Upon a cache miss in the primary cache, the missing cache line is 
loaded from memory into the primary cache. 

Cache Organization and Accessibility 

This section describes the organization of the primary cache, including 
the manner in which it is mapped, the addressing used to index the cache, 
and composition of the cache lines. The primary instruction and data 
caches are indexed with a virtual address (VA). 


11-2 









Cache Organization, Operation and Coherency 


Chapter 11 


Organization of the Primary Instruction Cache (I-Cache) 

Each line of primary I-cache data (although it is actually an instruction, 
it is referred to as data to distinguish it from its tag) has an associated 28- 
bit tag that contains a 24-bit physical address, a single valid bit, a reserved 
bit, a single parity bit and the FIFO replacement bit. Word parity is used 
on I-cache data. 

The R4600/R4700 processor primary I-cache has the following 
characteristics: 

• two-way set associative 

• indexed with a virtual address 

• checked with a physical tag 

• organized with 8-word (32-byte) cache line. 

Figure 1 1.3 shows the format of a primary I-cache line. 


27 

26 

25 

24 

23 




0 

E 

fi 

[I] 

□ 

PTag | 

1 

1 

1 

1 


24 

65 

64 63 

0 


PTag Physical tag (bits 35:12 of the physical address) 

V Valid bit 

F FIFO Replacement Bit. Complemented on refill. 

P Even parity for the PTag and V fields 

DataP Even parity; 1 parity bit per word of data 

Data Cache data 


DataP 

Data 

DataP 

Data 

DataP 

Data 

DataP 

Data 


64 


Figure 11.3 R4600/R4700 Primary I-Cache Line Format 


Organization of the Primary Data Cache (D-Cache) 

Each line of primary D-cache data has an associated 30-bit tag that 
contains a 24-bit physical address, 2-bit cache line state, a write-back bit, 
a parity bit for the physical address and cache state fields, a parity bit for 
the write-back bit and the FIFO replacement bit. 

The R4600/R4700 processor primary D-cache has the following 
characteristics: 

• write-back or write-through on a per-page basis 

• two-way set associative 

• indexed with a virtual address 

• checked with a physical tag 

• organized with 8-word (32-byte) cache line. 
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Figure 1 1.4 shows the format of a primaiy D-cache line. 


29 28 27 26 25 24 23 0 



8 64 


F FIFO Replacement Bit 

W Even parity for the write-back bit 

W Write-back bit (set if cache line has been written) 

P Even parity for the PTag and CS fields 

CS Primaiy cache state: 

0 = Invalid, 1 = Shared, 

2 = Clean Exclusive, 3 = Dirty Exclusive 

PTag Physical tag (bits 35: 12 of the physical address) 

DataP Even parity for the data; 1-bit per byte 

Data Cache data 

Figure 11.4 R4600/R4700 8-Word Primary Data Cache Line Format 

In the R4600/R4700, the W (write-back) bit, not the cache state, 
indicates whether or not the primaiy cache contains modified data that 
must be written back to memory. 

Note: There is no hardware support for cache coherency. Thus the only 
cache states used are Dirty Exclusive and Invalid. 
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Accessing the Primary Caches 

Figure 11.5 shows the virtual address (VA) index into the primary 
caches. Each instruction and data cache size is 16 Kbytes. 



Figure 11.5 Primary Cache Data and Tag Organization 


Cache States 

The terms below are used to describe the state of a cache line: 

• Exclusive: a cache line that is present in exactly one cache in the sys- 
tem is exclusive. This is always the case for the R4600/R4700. All 
cache lines are in an exclusive state. 

• Dirty: a cache line that contains data that has changed since it was 
loaded from memory is dirty. 

• Clean: a cache line that contains data that has not changed since it 
was loaded from memory is clean. 

• Shared: a cache line that is present in more than one cache in the 
system. The R4600/R4700 does not provide for hardware cache co- 
herency. This state should never happen in normal operations. 

The R4600/R4700 only supports the four cache states as shown in 
Table 11.1 on page 6. The only states that will occur in the R4600/R4700, 
under normal operations are the Dirty Exclusive and Invalid states. 

Note: EVen though valid data is in the Dirty Exclusive state, it may still 
be consistent with memory. One must look at the dirty bit, W, to determine 
if the cache line is to be written back to memory when it is replaced. 
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Each primary cache line in the R4600/R4700 system is in one of the 
states described in Table 11.1. 


Cache Line 
State 

Description 

Invalid 

A cache line that does not contain valid information must be marked invalid, and cannot 
be used. A cache line in any other state than invalid is assumed to contain valid informa- 
tion. 

Shared 

A cache line that is present in more than one cache in the system is shared. This state will 
not occur for normal operations. 

Clean Exclusive 

A clean exclusive cache line contains valid information and this cache line is not present 
in any other cache. The cache line is consistent with memory and is not owned by the pro- 
cessor (see “Cache Line Ownership” on page 6 in this chapter). This state will not occur 
for normal operations. 

Dirty Exclusive 

A dirty exclusive cache line contains valid information and is not present in any other 
cache. The cache line may or may not be consistent with memory and is owned by the 
processor (see “Cache Line Ownership” on page 6 in this chapter). Use the W bit to deter- 
mine if the line must be written back on replacement. 


Table 11.1 Cache States 


Primary Cache States 

Each primary data cache line is normally in one of the following states: 

• invalid 

• dirty exclusive 

Each primary instruction cache line is in one of the following states: 

• invalid 

• valid 

Cache Line Ownership 

The processor is the owner of a cache line when it is in the dirty 
exclusive state and is responsible for the contents of that line. There can 
only be one owner for each cache line. 

The ownership of a cache line is set and maintained through the rules 
described below. 

• A processor assumes ownership of the cache line if the state of the 
primary cache line is dirty exclusive. 

• A processor that owns a cache line is responsible for writing the cache 
line back to memory if the line is replaced during the execution of a 
Write-back or Write-back Invalidate cache instruction if the line is in 
a write-back page. The Cache instruction is explained in Appendix A. 

• Memory always owns clean cache lines 

• The processor gives up ownership of a cache line when the state of the 
cache line changes to invalid. 

Therefore, based on these rules and that any valid data cache line is in 
the Dirty Exclusive state (under normal operating conditions), the 
processor is considered to be the owner of the cache line. 

Cache Write Policy 

The R4600/R4700 processor manages its primary data cache by using 
either a write-back or a write-through policy on a per-page basis. In a 
write-back cache, the data is not written back to memory until the cache 
line is replaced. A write-through policy means the store data is written to 
the cache and to memory. The write of the data to memory may not occur 
at the same time as the write to cache due to the write buffer. 

For a write-back entry, if the cache line is valid and has been modified 
(the Wbit is set), the processor writes this cache line back to memory when 
the line is replaced, either in the course of satisfying a cache miss or during 
the execution of a Write-back or Write-back Invalidate CACHE instruction. 
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For a write-through entry, whenever a store hits in the cache line, the 
data is also written to memory via the write buffer. The store will not set or 
clear the W bit for a write-through cache line. This is to allow a different 
virtual address that maps to the same physical address and with a write- 
back policy to still set the W bit. For a miss to a write-through line, the 
action taken will be determined by the write-allocation policy. For a write- 
allocate entry, the cache line is first retrieved from memory and the store 
will then continue. A no write-allocate entry will just post the write to the 
system interface, via the write buffer, in the same manner as an uncached 
write. 

When the processor writes a cache line back to memory, it does not 
ordinarily retain a copy of the cache line, and the state of the cache line is 
changed to invalid. However, there are exceptions. For example, the 
processor retains a copy of the cache line if a cache line is written back by 
the Hit Write-back cache instruction. If the W bit is set, the cache line is 
written back and the W bit is cleared. The processor signals this line 
retention during a write by setting SysCmd(2) to a 1, as described in 
Chapter 12. 

Cache State Transition Diagrams 

The following sections describe the cache state diagrams that illustrate 
the cache state transitions for the primary cache. Figure 11.6 shows the 
state diagram of the primary cache. 

When an external agent supplies a cache line, it need not return the 
initial state of the cache line, for normal operations (see Chapter 12 for a 
definition of an external agent). This is because the only read request the 
R4600/R4700 should issue are for non-coherent data and the lower three 
bits for the data identifier are reserved. The initial state will automatically 
be set to DE by the R4600/R4700. Otherwise, the processor changes the 
state of the cache line during one of the following events: 

• A store to a dirty exclusive line remains in a dirty exclusive state. 

• The state is changed to invalid for: 

- A Cache invalidate operation. 

- If the line is replaced 



Cache Coherency Overview 

Systems using more than one master must have a mechanism to 
maintain data consistency throughout the system. This mechanism is 
called a cache coherency protocol. The R4600/R4700 does not provide 
any hardware cache coherency. Cache coherency must be handled with 
software. 

Cache Coherency Attributes 

Cache coherency attributes are necessary to ensure the consistency of 
data throughout the system. 
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Bits in the translation look-aside buffer (TLB) control coherency on a 
per-page basis. Specifically, the TLB contains 3 bits per entry that provide 
two possible coherency attribute types; they are listed below and described 
more fully in the following sections. 

• uncached 

• noncoherent (includes 3 attribute values) 

Table 11.2 summarizes the behavior of the processor on load misses and 
store misses for each of the coherency attribute types listed above. The 
following sections describe in detail these coherency attribute types 


Attribute Type 

Load Miss 

Store Miss 

Uncached 

Main memory read 

Main memory write 

Noncoherent 

Noncoherent read 

Noncoherent read (write-allocate page) 

Main memory write (no write-allocate page) 


Table 11.2 Coherency Attributes and Processor Behavior 


Uncached 

Lines within an uncached page are never in a cache. When a page has 
the uncached coherency attribute, the processor issues a doubleword, 
partial-doubleword, word, or partial-word read or write request directly to 
main memory (bypassing the cache) for any load or store to a location 
within that page. 

Noncoherent 

Lines with a noncoherent attribute type can reside in a cache; a load 
miss causes the processor to issue a noncoherent block read request to a 
location within the cached page. For a store miss to a write-allocate page, 
the processor issues a noncoherent block read request to a location within 
the cached page and then does the write-through. If the page has the no 
write-allocate attribute, a store miss will generate a write to the memory as 
in the uncached case. 

Cache Operation Modes 

The R4600/R4700 processor only supports the no-secondary-cache 
mode (only uncached and noncoherent coherency attributes are 
applicable) of R4x00 operation. 

R4600/R4700 Processor Synchronization Support 

In a multiprocessor system, it is essential that two or more processors 
working on a common task execute without corrupting each other’s 
subtasks. Synchronization, an operation that guarantees an orderly 
access to shared memory, must be implemented for a properly functioning 
multiprocessor system. Two of the more widely used methods are 
discussed in this section: test-and-set, and counter. Even though the 
R4600/R4700 does not support symmetric multi-processing (SMP), these 
are useful for multi-master and heterogenous multi-processing. 

Test-and-Set 

Test-and-set uses a variable called the semaphore, which protects data 
from being simultaneously modified by more than one processor. In other 
words, a processor can lock out other processors from accessing shared 
data when the processor is in a critical section, a part of program in which 
no more than a fixed number of processors is allowed to execute. In the 
case of test-and-set, only one processor can enter the critical section. 
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Figure 11.7 illustrates a test-and-set synchronization procedure that 
uses a semaphore; when the semaphore is set to 0, the shared data is 
unlocked, and when the semaphore is set to 1, the shared data is locked. 



Figure 11.7 Synchronization with Test-and-Set 

The processor begins by loading the semaphore and checking to see if it 
is unlocked (set to 0) in steps 1 and 2. If the semaphore is not 0, the 
processor loops back to step 1. If the semaphore is 0, indicating the shared 
data is not locked, the processor next tries to lock out any other access to 
the shared data (step 3). If not successful, the processor loops back to step 
1, and reloads the semaphore. 

If the processor is successful at setting the semaphore (step 4), it 
executes the critical section of code (step 5) and gains access to the shared 
data, completes its task, unlocks the semaphore (step 6), and continues 
processing. 

Counter 

Another common synchronization technique uses a counter. A counter 
is a designated memory location that can be incremented or decremented. 

In the test-and-set method, only one processor at a time is permitted to 
enter the critical section. Using a counter, up to N processors are allowed 
to concurrently execute the critical section. All processors after the Mh 
processor must wait until one of the N processors exits the critical section 
and a space becomes available. 

The counter works by not allowing more than one processor to modify it 
at any given time. Conceptually, the counter can be viewed as a variable 
that counts the number of limited resources (for example, the number of 
processes, or software licenses, etc.). 
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Figure 1 1.8 shows this process. 



Figure 11.8 Synchronization Using a Counter 

Load Linked and Store Conditional 

The R4600/R4700 instructions Load Linked (LL) and Store Conditional 
(SC) provide support for processor synchronization. These two 
instructions work very much like their simpler counterparts, load and 
store. The LL instruction, in addition to doing a simple load, has the side 
effect of setting a bit called the link bit This link bit forms a breakable link 
between the LL instruction and the subsequent SC instruction. The SC 
performs a simple store if the link bit is set when the store executes. If the 
link bit is not set, then the store fails to execute. The success or failure of 
the SC is indicated in the target register of the store. 

The link is broken upon completion of an ERET (return from exception) 
instruction. 

The most important features of LL and SC are: 

• They provide a mechanism for generating all of the common synchro- 
nization primitives including test-and-set, counters, sequencers, etc., 
with no additional overhead. 

• When they operate, bus traffic is generated only if the state of the 
cache line changes: lock words stay in the cache until some other pro- 
cessor takes ownership of that cache line. 
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Examples Using LL and SC 

Figure 11.9 shows how to Implement test-and-set using LL and SC 
instructions. 



Unlock semaphore 


Loop: LL r2,(r1) 

OR[ r3,r2,t '• 

BEQ r3.r2.Loop : 
NOP ; •" 

$Cr3,(rl) 



SW r2,{r1) ' 


Figure 11.9 Test-and-Set using LL and SC 
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Figure 1 1. 10 shows synchronization using a counter. 
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Introduction 

The System Interface allows the processor to access external resources 
needed to satisfy cache misses and uncached operations, while permit- 
ting an external agent access to some of the processor internal resources. 

This chapter describes the system interface from the point of view of 
both the processor and the external agent. 

Terminology 

The following terms are used in this chapter: 

An external agent is any logic device connected to the processor, over 
the system interface, that allows the processor to issue requests. 

A system event is an event that occurs within the processor and 
requires access to external system resources. 

Sequence refers to the precise series of requests that a processor gener- 
ates to service a system event. 

Protocol refers to the cycle-by-cycle signal transitions that occur on the 
system interface pins to assert a processor or external request. 

Syntax refers to the precise definition of bit patterns on encoded buses, 
such as the command bus. 

System Interface Description 

The R4600/R4700 processor supports a 64-bit address/data interface 
that can construct a simple uniprocessor with main memory. The System 
interface consists of: 

• 64-bit address and data bus, SysAD 

• 8-bit SysAD check bus, SysADC (even parity only) 

• 9-bit command bus, SysCmd 

• six handshake signals: 

- RdRdy*, WrRdy* 

- ExtRqst*, Release* 

- Validln*, ValidOut* 

The processor uses the system interface to access external resources in 
order to service processor requests such as cache misses, cache line 
write-backs, write-through stores and uncached operations. 
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Interface Buses 

Figure 12.1 shows the primary communication paths for the system 
Interface: a 64-blt address and data bus, SysAD(63:0), and a 9-bit 
command bus, SysCmd(8:0). These SysAD and the SysCmd buses are 
bidirectional: that is, they are driven by the processor to issue a processor 
request, and by the external agent to issue an external request (see 
“Processor and External Request Protocols” on page 12-14 for more infor- 
mation). 

A request through the system interface consists of: 

• an address 

• a System interface command that specifies the precise nature of the 
request 

• a series of data elements if the request is for a write or read response. 





R4600/R4700 


External Agent 

SysAD(63:0) 




SysCmd(8:0) 









Figure 12. 1 System Interface Buses 

Address and Data Cycles 

Cycles in which the SysAD bus contains a valid address are called 
address cycles. Cycles in which the SysAD bus contains valid data are 
called data cycles. Validity is determined by the state of the Validln* and 
ValidOut* signals (described in “Interface Buses” on page 12-2). 

The SysCmd bus identifies the contents of the SysAD bus during any 
cycle in which it is valid. The most significant bit of the SysCmd bus is 
always used to indicate whether the current cycle is an address cycle or a 
data cycle. 

• During address cycles [SysCmd(8) = 0], the remainder of the SysCmd 
bus, SysCmd(7:0), contains a System interface command (the encod- 
ing of system interface commands is detailed in “System Interface 
Commands and Data Identifiers” on page 12-32). 

• During data cycles [SysCmd(8) =1], the remainder of the SysCmd 
bus, SysCmd(7:0), contains a data identifier (the encoding of data 
identifiers is detailed later in this chapter). 


12-2 





System Interface 


Chapter 12 


Issue Cycles 

There are two types of processor issue cycles: 

• processor read request issue cycles 

• processor write request issue cycles. 

The processor samples the signal RdRdy* to determine the issue cycle 
for a processor read request; the processor samples the signal WrRdy* to 
determine the issue cycle of a processor write request. 

As shown in Figure 12.2, RdRdy* must be asserted for one clock cycle, 
two cycles prior to the address cycle of the processor read request to 
define the address cycle as the issue cycle (cycle 5 in Figure 12.2). 
RdRdy* does not need to be asserted during the issue cycle. 



* Issue 

SCycle 

|| 1 | 2 | 3 | 4 | 5 | 6 | 

SCIock 

i -\i\i\f\f\ru 

SysAD Bus 

1 


RdRdy* 

V / 

Note: RdRdy* must be sampled LOW at the end of cycle 3, 
which is marked with an asterisk. 


Figure 12.2 State of RdRdy* Signal for Read Requests 


As shown in Figure 12.3, WrRdy* must be asserted for one clock cycle, 
two cycles prior to the first address cycle of the processor write request to 
define the address cycle as the issue cycle (cycle 5 in Figure 12.3). 
WrRdy* does not need to be asserted during the issue cycle. 


* Issue 

SCycle ||l|2|3|4|5|e| 

sciock i 

SysADBus 1 X Addr X Data 

WrRdy* | \ / 


Note: WrRdy* must be sampled LOW at the end of cycle 3, 
which is marked with an asterisk. 


Figure 12.3 State of WrRdy* Signal for Write Requests 

The processor repeats the address cycle for the request until the condi- 
tions for a valid issue cycle are met. After the issue cycle, if the processor 
request requires data to be sent, the data transmission begins. There is 
only one issue cycle for any processor request. 

The processor accepts external requests, even while attempting to issue 
a processor request, by releasing the system interface to slave state in 
response to an assertion of ExtRqst* by the external agent. 
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Note that the rules governing the issue cycle of a processor request are 
strictly applied to determine the action the processor takes. The 
processor either: 

• completes the issuance of the processor request in its entirety before 
the external request is accepted, or 

• releases the system interface to slave state without completing the is- 
suance of the processor request. 

In the latter case, the processor issues the processor request (provided 
the processor request is still necessary) after the external request is 
complete. The rules governing an issue cycle again apply to the processor 
request. 

Handshake Signals 

The processor manages the flow of requests through the following six 
control signals: 

• RdRdy*, WrRdy* are used by the external agent to indicate when it 
can accept a new read (RdRdy*) or write (WrRdy*) transaction. 

• ExtRqst*, Release* are used to transfer control of the SysAD and 
SysCmd buses. ExtRqst* is used by an external agent to indicate a 
need to control the interface. Release* is asserted by the processor 
when it transfers the mastership of the system interface to the exter- 
nal agent. 

• The R4600/R4700 processor uses ValidOut* and the external agent 
uses Validln* to indicate valid command/data on the SysCmd/ 
SysAD buses. 

System Interface Protocols 

Figure 12.4 shows the system interface operates from register to 
register. That is, processor outputs come directly from output registers 
and begin to change with the rising edge of SClock . 1 

Processor inputs are fed directly to input registers that latch these 
input signals with the rising edge of SClock. This allows the system 
interface to run at the highest possible clock frequency. 



Output data 


h* 


Input data 


-► 


Figure 12.4 System Interface Regis ter-to-Regis ter Operation 


L SClock is an internal clock used by the processor to sample data at the system 
interface and to clock data into the processor system interface output registers; 
see Chapter 10 for more details. 
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Master and Slave States 

When the R4600/R4700 processor is driving the SysAD and SysCmd 
buses, the system interface is in master state. When the external agent is 
driving the SysAD and SysCmd buses, the system interface is in slave 
state. 

In master state, the processor drives the SysAD and SysCmd buses and 
will assert the signal ValidOut* whenever these buses are valid. 

In slave state, the external agent drives the SysAD and SysCmd buses 
and asserts the signal Validln* whenever these buses are valid. 

Moving from Master to Slave State 

The system interface remains in master state unless one of the following 
occurs: 

• The external agent requests and is granted the system interface (ex- 
ternal arbitration). 

• The processor issues a read request and performs an uncompelled 
change to slave state. 

External Arbitration 

The system interface must be in slave state for the external agent to 
issue an external request through the system interface. The transition 
from master state to slave state is arbitrated by the processor using the 
system interface handshake signals ExtRqst* and Release*. This transi- 
tion is described by the following procedure: 

1. An external agent signals that it wishes to issue an external request 
by asserting ExtRqst*. 

2. When the processor is ready to accept an external request, it releases 
the system interface from master to slave state by asserting Release* for 
one cycle. 

3. The system interface returns to master state as soon as the issue of 
the external request is complete. 

This process is described in “External Arbitration Protocol” on page 12- 
24. 

Uncompelled Change to Slave State 

An uncompelled change to slave state is the transition of the system 
interface from master state to slave state, initiated by the processor when 
a processor read request is pending. Release* is asserted automatically 
after a read request. An uncompelled change to slave state occurs during 
the issue cycle of a read request. 

After an uncompelled change to slave state, the processor returns to 
master state at the end of the next external request. This can be a read 
response, or some other type of external request. 

An external agent must note that the processor has performed an 
uncompelled change to slave state and begin driving the SysAD bus along 
with the SysCmd bus. As long as the system interface is in slave state, 
the external agent can begin a single external request without arbitrating 
for the system interface: that is, without asserting ExtRqst*. 

After the external request, the system interface returns to master state. 

Whenever a processor read request is pending, after the issue of a read 
request, the processor automatically switches the system interface to 
slave state, even though the external agent is not arbitrating to issue an 
external request. This transition to slave state allows the external agent 
to quickly return read response data. 
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Processor and External Requests 

There are two broad categories of requests: processor requests and 
external requests. These two categories are described in this section. 

When a system event occurs, the processor issues either a single 
request or a series of requests — called processor requests — through the 
system interface, to access an external resource and service the event. 
For this to work, the processor system interface must be connected to an 
external agent that is compatible with the system interface protocol, and 
can coordinate access to system resources. 

An external agent requesting access to a processor status register 
generates an external request. This access request passes through the 
system interface. System events and request cycles are shown in 
Figure 12.5. 


R4600/R4700 


External Agent 

Processor Requests : : : 

♦ Read 

♦ Write 


Externa) Requests 

* Read i 

* Write 
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System Events 

• load Sites 

* Store Miss 

* Store Hit on write-through 

• Uacached Load/Store 
; * CACHEop® rations : 


Figure 12.5 Requests and System Events 

Rules for Processor Requests 

The following rules apply to processor requests. 

• After issuing a processor read request, the processor cannot issue a 
subsequent read request until it has received a read response. 

• After the processor has issued a write request in R4x00 compatible 
write mode (set at boot time), the processor cannot issue a subsequent re- 
quest until at least four cycles after the issue cycle of the write request. 
This means back-to-back write requests with a single data cycle are sepa- 
rated by two unused system cycles, as shown in Figure 12.6. 

• After the processor has issued a write request in either of the two new 
write modes, write reissue and pipelined writes, the processor can issue a 
subsequent write immediately provided the WrRdy* requirement is meet. 
This is discussed in more detail later in this chapter. 
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Figure 12.6 Back-to-Back Write Cycle Timing 
(R4000 compatible mode) 


Processor Requests 

A processor request is a request or a series of requests, through the 
system interface, to access some external resource. As shown in 
Figure 12.7, processor requests include only reads and writes. 



Read request asks for a block, doubleword, partial doubleword, word, or 
partial word of data either from main memory or from another system 
resource. 

Write request provides a block, doubleword, partial doubleword, word, 
or partial word of data to be written either to main memory or to another 
system resource. 

Processor requests are managed by the processor in the equivalent of 
the R4000/R4400 no-secondary-cache mode. 

In no-secondary-cache mode, the processor issues requests in a strict 
sequential fashion; that is, the processor is only allowed to have one 
request pending at any time. For example, the processor issues a read 
request and waits for a read response before issuing any subsequent 
requests. The processor submits a write request only if there are no read 
requests pending. 

The processor has the input signals RdRdy* and WrRdy* to allow an 
external agent to manage the flow of processor requests. RdRdy* 
controls the flow of processor read requests, while WrRdy* controls the 
flow of processor write requests. 

The processor request cycle sequence is shown in Figure 12.8. 


12-7 






System Interface 


Chapter 12 


R4600/R4700 


External Agent 

1 . Processor issues read or 



write request 

- 

__ 2. External system controls 
acceptance of requests by 
asserting RdRdy* or WrRdy* 


Figure 12.8 Processor Request 
Processor Read Request 

When a processor issues a read request, the external agent must access 
the specified resource and return the requested data. (Processor read 
requests are described in this section; external read requests are 
described in “External Requests” on page 12-9.) 

A processor read request can be split from the external agent’s return of 
the requested data; in other words, the external agent can initiate an 
unrelated external request before it returns the response data for a 
processor read. A processor read request is completed after the last word 
of response data has been received from the external agent. 

Note that the data identifier (see “System Interface Commands and Data 
Identifiers” on page 12-32) associated with the response data can signal 
that the returned data is erroneous, causing the processor to take a bus 
error. 

Processor read requests that have been issued, but for which data has 
not yet been returned, are said to be pending. A read remains pending 
until the requested read data is returned. 

In no-secondary-cache mode, the external agent must be capable of 
accepting a processor read request any time the following two conditions 
are met: 

• There is no processor read request pending. 

• The signal RdRdy* has been asserted for one clock cycle, two cycles 
before the issue cycle. 

Processor Write Request 

When a processor issues a write request, the specified resource is 
accessed and the data is written to it. (Processor write requests are 
described in this section; external write requests are described in 
“External Requests” on page 12-9.) 

A processor write request is complete after the last word of data has 
been transmitted to the external agent. 

In no-secondary-cache mode, the external agent must be capable of 
accepting a processor write request any time the following two conditions 
are met: 

• No processor read request is pending. 

• The signal WrRdy* has been asserted for one clock cycle, two cycles 
before the issue cycle. 
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The R4600/R4700 has added two new modes to enhance the 
throughput of non-block writes. These modes allow for 2 cycle throughput 
on back-to-back non-block writes. The actual protocol is discussed in the 
write protocol section of this chapter. The external agent must be capable 
of accepting a processor write request in these modes under the same 
conditions as for the R4x00 compatibility mode (except as explained in 
the protocol section. 

External Requests 

External requests include read, write and null requests, as shown in 
Figure 12.9. This section also includes a description of read response, a 
special case of an external request. 



Figure 12.9 External Requests 


Read request asks for a word of data from the processor’s internal 
resource. 

Write request provides a word of data to be written to the processor’s 
internal resource. 

Null request requires no action by the processor; it provides a mecha- 
nism for the external agent to return control of the system interface to the 
master state without affecting the processor. 

The processor controls the flow of external requests through the arbi- 
tration signals ExtRqst* and Release*, as shown in Figure 12.10. The 
external agent must acquire mastership of the system interface before it 
is allowed to issue an external request; the external agent arbitrates for 
mastership of the system interface by asserting ExtRqst* and then 
waiting for the processor to assert Release* for one cycle. 
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Figure 12.10 External Request 
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Mastership of the system interface always returns to the processor after 
an external request is issued. The processor does not accept a subse- 
quent external request until it has completed the current request. 

If there are no processor requests pending, the processor decides, based 
on its internal state, whether to accept the external request, or to issue a 
new processor request. The processor can issue a new processor request 
even if the external agent is requesting access to the system interface. 

The external agent asserts EztRqst* indicating that it wishes to begin 
an external request. The external agent then waits for the processor to 
signal that it is ready to accept this request by asserting Release*. The 
processor signals that it is ready to accept an external request based on 
the criteria listed below. 

• The processor completes any processor request that is in progress. 

• While waiting for the assertion of RdRdy* to issue a processor read 
request, the processor can accept an external request if the request is 
delivered to the processor one or more cycles before RdRdy* is assert- 
ed. 

• While waiting for the assertion of WrRdy* to issue a processor write 
request, the processor can accept an external request provided the re- 
quest is delivered to the processor one or more cycles before WrRdy* 
is asserted. 

• If waiting for the response to a read request after the processor has 
made an uncompelled change to a slave state, the external agent can 
issue an external request before providing the read response data. 

External Read Request 

In contrast to a processor read request, data is returned directly in 
response to an external read request; no other requests can be issued 
until the processor returns the requested data. An external read request 
is complete after the processor returns the requested word of data. 

The data identifier (see “System Interface Commands and Data Identi- 
fiers” on page 12-32) associated with the response data can signal that 
the returned data is erroneous, causing the processor to take a bus error. 

Note: The R4600/R4700 does not contain any resources that are 
readable by an external read request; in response to an external read 
request the processor returns undefined data and a data identifier with 
its Erroneous Data bit, SysCmd(5), set. 

External Write Request 

When an external agent issues a write request, the specified resource is 
accessed and the data is written to it. An external write request is 
complete after the word of data has been transmitted to the processor. 

The only processor resource available to an external write request is the 
IP field of the Cause register. 

Read Response 

A read response returns data in response to a processor read request, 
as shown in Figure 12.11. While a read response is technically an 
external request, it has one characteristic that differentiates it from all 
other external requests — it does not perform system interface arbitration. 
For this reason, read responses are handled separately from all other 
external requests, and are simply called read responses. When a read 
response comes back with bad parity for the first datum, a cache error 
exception results. 
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Figure 12.11 Read Response 


Handling Requests 

This section details the sequence, protocol and syntax (see “Termi- 
nology” on page 12-1 for definitions of these terms) of both processor and 
external requests. The following system events are discussed: 

• load miss (no-secondary-cache mode) 

• store miss (no-secondary-cache mode) 

• store hit 

• uncached loads/stores 

• CACHE operations 

• load linked store conditional. 

Load Miss 

When a processor load misses in the primary cache, before the 
processor can proceed it must obtain the cache line that contains the 
data element to be loaded from the external agent. 

If the new cache line replaces a current cache line with a W bit set, the 
current cache line must be written back. 

The processor examines the coherency attribute (cache coherency 
attributes are described in Chapter 1 1) in the TLB entry for the page that 
contains the requested cache line, and executes the following request: 

• The coherency attribute is noncoherent, the processor issues a non- 
coherent read request. 

Table 12.1 shows the actions taken on a load miss to primary cache. 


Page Attribute 

State of Data Cache Line Being Replaced 

Clean /Invalid 

Dirty (W=l) 

Noncoherent 

NCR 

NCR/W 


NCR Processor noncoherent block read request 

NCR/W Processor noncoherent block read request followed by processor 
block write request 


Table 12.1 Load Miss to Primary Cache 
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No-Secondary-Cache Mode — Load Miss 

In no-secondaiy-cache mode, if the cache line must be written back on 
a load miss, the read request is issued and completed before the write 
request is handled. The processor takes the following steps: 

1 . The processor issues a noncoherent read request for the cache line 
that contains the data element to be loaded. 

2. The processor then waits for an external agent to provide the read 
response. 

3. The processor will restart the pipeline after the first doubleword (the 
data that missed is fetched first). The rest of the data cache line will be 
placed into the cache in parallel. 

If the current cache line must be written back, the processor issues a 
write request to save the dirty cache line in memory. 

Store Miss 

When a processor store misses in the primary cache, the processor may 
request, from the external agent, the cache line that contains the target 
location of the store for pages that are either write-back or write-through 
with write-allocate only. The processor examines the coherency attribute 
in the TLB entry for the page (TLB page coherency attributes are listed in 
Chapter 4) that contains the requested cache line to see if the line is 
write-allocate or no-write-allocate. 

The processor then executes one of the following requests: 

• If the coherency attribute is noncoherent, write-back or noncoherent, 
write-through with write-allocate, a noncoherent block read request 
is issued. 

• If the coherency attribute is noncoherent, write-through with no 
write-allocate, the processor issues a non-block write request. 

Table 12.1 shows the actions taken on a store miss to the primary 
cache. 


Page Attribute 

State of Data Cache Line Being Replaced 

Clean/Invalid 

Dirty (W=l) 

Noncoherent, write-back or 
Noncoherent, write-through with 
write-allocate 

NCR 

NCR/W 

Noncoherent, write-through with 
no write-allocate 

NCW 

NA 

NCR Processor noncoherent block read request 

NCR/W Processor noncoherent block read request followed by processor 
block write request 

NCW Processor noncoherent write request 


Table 12.2 Store Miss to Primary Cache 


No-Secondary-Cache Mode — Store Miss 

If the coherency attribute is write-back or write-through with write-allo- 
cate, the processor issues a read request for the cache line that contains 
the data element to be loaded, then awaits the external agent to provide 
read data in response to the read request. Then, if the current cache line 
must be written back, the processor issues a write request for the current 
cache line. For a write- through, no write-allocate store miss, the 
processor issues a write request only. 
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In no-secondary-cache mode, If the new cache line replaces a current 
cache line whose Write back (W) bit is set, the current cache line moves to 
an internal write buffer before the new cache line is loaded in the primary 
cache. 

Store Hit 

This section describes store hits in no-secondary-cache mode for both 
write-back and write-through lines. 

No-Secondary-Cache Mode — Store Hit 

In no-secondary-cache mode, the action on the system interface will be 
determined by whether the line is write-back or write-through. All lines 
that use a write-back policy are set to the dirty exclusive cache state and 
there is no bus transactions generated. For lines with a write-through 
policy, the store will also generate a processor write request for the store 
data. 

Uncached Loads or Stores 

When the processor performs an uncached load, it issues a nonco- 
herent word read request (the actual access can be for a doubleword, 
word, partial word or byte, but the request is called a word read request 
to differentiate it from the block read request). When the processor 
performs an uncached store, it issues a doubleword, partial doubleword, 
word, or partial word write request. 

The CPU expects valid parity and data in the full SysAD bus (all 64 
bits), even if it is looking for less than a double word. Even if you do not 
want to return the full double word, you still must tell it not to check the 
parity if you are not using all 64 bits. In other words, either return 64 
bits with parity, or tell it not to check parity. 

All writes by the processor will be buffered from the system interface by 
the 4-deep write buffer. The write requests are sent to the system inter- 
face when there are no other requests in progress. If the write buffer 
contains any entries when a block request is needed, the write buffer is 
first flushed before any read request will occur (cache miss or uncached 
load). 

Both a data cache miss and an uncached data load will flush the write 
buffer. 

CACHE Operations 

The processor provides a variety of CACHE operations to maintain the 
state and contents of the primary cache. During the execution of the 
CACHE operation instructions, the processor can issue write requests. 
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Load Linked/Store Conditional Operation 

Generally, the execution of a Load Linked/Store Conditional Instruction 
sequence is not visible at the system interface; that is, no special requests 
are generated due to the execution of this instruction sequence. 

There is, however, one situation in which the execution of a Load 
Linked/Store Conditional instruction sequence is visible, as indicated by 
the link address retained bit during a processor read request, as 
programmed by the SysCmd(2) bit. This situation occurs when the data 
location targeted by a Load-Linked-Store-Conditional instruction 
sequence maps to the same cache line to which the instruction area 
containing the Load Linked/Store Conditional code sequence is mapped. 
In this case, immediately after executing the Load Linked instruction, the 
cache line that contains the link location is replaced by the instruction 
line containing the code. The link address is kept in a register separate 
from the cache, and remains active as long as the link bit, set by the Load 
Linked instruction, is set. 

The link bit, which is set by the load linked instruction, is cleared by a 
change of cache state for the line containing the link address, or by a 
Return From Exception. 

For more information, refer to Chapter 11, or see the specific Load 
Linked and Store Conditional instructions described in Appendix A. 

Processor and External Request Protocols 

The following sections contain a cycle-by-cycle description of the bus 
arbitration protocols for each type of processor and external request. 
Table 12.3 lists the abbreviations and definitions for each of the buses 
that are used in the timing diagrams that follow. 


Scope 

Abbreviation 

Meaning 

Global 

Unsd 

Unused 

SysAD bus 

Addr 

Physical address 

Data<n> 

Data element number n of a block of data 

%sCmd bus 

Cmd 

An unspecified system interface command 

Read 

A processor or external read request command 

Write 

A processor or external write request command 

SINull 

A system interface release external null request 
command 

NData 

A noncoherent data identifier for a data element 
other than the last data element 

NEOD 

A noncoherent data identifier for the last data 
element 


Table 12.3 System Interface Requests 


Processor Request Protocols 

Processor request protocols described in this section include: 

• read 

• write 

Note: In the timing diagrams, the two closely spaced, wavy vertical 
lines (see SCycle 2 in Figure 12.20 on page 12-24) indicate one or more 
identical cycles. 
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Processor Read Request Protocol Steps 

The following sequence describes the protocol for a processor read 
request (the numbered steps below correspond to the numbers In 
Figure 12.12 on page 12-16). 

1. RdRdy* is asserted low, indicating the external agent is ready to 
accept a read request. 

2. With the system interface in master state, a processor read request is 
issued by driving a read command on the SysCmd bus and a read address 
on the SysAD bus. 

3. At the same time, the processor asserts ValidOut* for one cycle, 
indicating valid data is present on the SysCmd and the SysAD buses. 

Note: Only one processor read request can be pending at a time. 

4. The processor makes an uncompelled change to slave state at the 
issue cycle of the read request by asserting the Release* signal for one 
cycle. 

Note: The external agent must not assert the signal ExtRqst* for the 
purposes of returning a read response, but rather must wait for the 
uncompelled change to slave state. The signal ExtRqst* can be asserted 
before or during a read response to perform an external request other than 
a read response. 

5. The processor releases the SysCmd and the SysAD buses one SCycle 
after the assertion of Release*. 

6. The external agent drives the SysCmd and the SysAD buses within 
two cycles after the assertion of Release*. 

Once in slave state (starting at cycle 5 in Figure 12.12), the external 
agent can return the requested data through a read response. The read 
response can return the requested data or, if the requested data could not 
be successfully retrieved, an indication that the returned data is erro- 
neous. If the returned data is erroneous, the processor takes a bus error 
exception. 

Note: The R4600/R4700 only check the error bit for the first 
doubleword of read response data, all other error bits are ignored. 
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Figure 12.12 illustrates a processor read request, coupled with an 
uncompelled change to slave state. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 



The assertion of Release* indicates either an uncompelled change to 
slave state, or a response to the assertion of ExtRqst*, whereupon the 
processor accepts either a read response, or any other external request. 
If any external request other than a read response is issued, the 
processor performs another uncompelled change to slave state after 
processing the external request. 

The actual read response, where the external agent returns the 
requested data, is shown later in this chapter. 

External Instruction Read Response Time 

The R4600/R4700 accesses the external bus due to instruction cache 
miss or an uncached reference. The length of time for an external read is 
based on the overhead at the beginning and end of the read along with the 
time to drive the address and get the response data. 
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Instruction Read Latency Steps for System Clock 

The read latency for a system clock In the divide-by-two mode is as 
follows: 

1 . The startup overhead is one to two pipeline cycles (PCycle) for the CPU 
to transfer the address to the pads to be output. The second PCycle is 
needed if the miss is detected on a PCycle not aligned with the rising edge 
of SClock. 

2. The CPU drives the address on the SysAD bus for two PCycles. 

3. The CPU tri-states the SysAD bus for two PCycles. 

4. The CPU waits for the main memory to return the data. This is 
expressed as nx 2 PCycles. 

5. The first double word is driven in the SysAD from the main memory 
for two PCycles. 

6. The remaining three double words of instruction are driven on 
SysAD for 3*2 PCycles. 

Notes on the Instruction Read Latency Steps: 

a. For instruction misses the pipeline starts after all the instructions are 
returned. 

b. n is the total number of idle cycles (even between double word 
instruction). For zero wait state systems, n = 0. 

Example of Instruction Block Read With Zero Wait State 

The following example shows an instruction block read with a zero wait 
state: 

StepDescriptionPCycles 

1 . CPU overhead for cache miss detection: 1-2 

2. Address driven on SysAD bus: 2 

3. SysAD bus tri-stated:2 

4. Memory latency to return the data:0*2 

5. First double word driven on SysAD bus:2 

6. Remaining three instructions returned:2*3=6 

Total PCycles: 13-14 

External Data Read Response Time 

The R4600/R4700 accesses the external bus due to data cache miss or 
an uncached reference. The length of time for an external read is based 
on the overhead at the beginning and end of the read along with the time 
to drive the address and get the response data. 
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Data Read Latency Steps for System Clock 

The read latency for a system clock in the divide-by-two mode is as 
follows: 

1. The startup overhead is one to two pipeline cycles (PCycle) for the 
CPU to generate the parity for the address to be output. The second PCycle 
is needed if the miss is detected or a PCycle not aligned with the rising edge 
of SClock. 

2. The CPU drives the address on the SysAD bus for two PCycles. 

3. The CPU tri-states the SysAD bus for two PCycles. 

4. The CPU waits for the main memoiy to return the data. This is 
expressed as n x 2 PCycles where n is the number of SClock cycles for the 
first data to be returned in a block read, or the latency for the single read. 
For zero wait state memory system n should be zero. 

5. The first double word is driven in the SysAD from the main memoiy 
for two PCycles. 

6. The end of the overhead is two PCycles: one to transfer the data from 
the pads and generate the parity, and one to write to the register (or cache, 
if it is cacheable data). 

Notes on the Data Read Latency Steps: 

a. If n=0 and the line being replaced is dirty, the CPU takes one to two 
additional PCycles of overhead to move the dirty data into the write 
buffer. 

b. The additional latency for returning the remaining three data 
elements should be added in a similar fashion. 

c. If cache line needs to be written back the read request is posted first, 
then the write is completed. 

Example of Data Single Read With Zero Wait State 

The following example shows a data block read with a zero wait state: 
StepDescriptionPCycles 

1 . CPU overhead for cache miss detection: 1-2 

2. Address driven on SysAD bus: 2 

3. SysAD bus tri-stated:2 

4. Memory latency to return the data:0*2 

5. First double word driven on SysAD bus: 2 

6. CPU overhead to write the data cache, 
do the fixup, and then restart:2 

Total PCycles:9-10 

External Cycles for Read Latency 

The external cycles to get the response data will look similar to Figure 
12.13. For a larger “divide by” it will take longer to get the response data. 



PCIk 

SClock | J \ / 

SysAD Bus |~) ( Addr ) - 


VJ~ 


- ( Data 


Figure 12.13 Uncached Read — External Cycles 
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The same operation is shown in greater detail in Figure 12.14. These 
figures assume the following: 

1. Data is returned immediately after the Release* is asserted, and after 
the bus turn-around cycle (when the CPU tri-states the bus to allow the 
external agent to drive it). 

2. The data meets the setup and hold requirements for the rising edge 
of the SClock that is identified in the preceding and following figures with 
an asterisk. 



Processor Write Request Protocol 

Processor write requests are issued using one of two protocols. 

• Doubleword, partial doubleword, word, or partial word writes use a 
word 1 write request protocol. 

• Block writes use a block write request protocol. 

Processor word write requests are issued with the system interface in 
master state, as described in the following steps. Figure 12.15 shows a 
processor noncoherent word write request cycle. 

1 . A processor single word write request is issued by driving a write 
command on the SysCmd bus and a write address on the SysAD bus. 

2. The processor asserts ValidOut*. 

3. The processor drives a data identifier on the SysCmd bus and data 
on the SysAD bus. 

4. The data identifier associated with the data cycle must contain a last 
data cycle indication. At the end of the cycle, ValidOut* is deasserted. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 


L Called word to distinguish it from block request protocol. Data transferred can 
actually be doubleword, partial doubleword, word, or partial word. 
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The R4600/R4700 interface requires that WrRdy* be asserted two 
system cycles prior to the issue of a write, for one clock cycle. An external 
agent that deasserts WrRdy* immediately upon receiving the write that 
fills its buffer will stop a subsequent write for four system cycles in R4000 
non-block write compatible mode. This leaves two null system cycles after 
a write address/data pair to give the external agent time to stop the next 
write. This is illustrated in Figure 12.6 on page 12-7. 

An Address/data pair every four system cycles is not sufficiently high 
performance for all applications. For this reason, the R4600/R4700 
provides two new protocol options that modify the R4000 back-to-back 
write protocol to allow an address/ data pair every two system cycles. The 
first protocol, called write re-issue, allows WrRdy* to be deasserted during 
the address cycle and forces a write to be re-issued. The second, called 
pipelined writes, leaves the sample point of WrRdy* unchanged and 
requires that the external agent accept one more write than the R4000 
protocol. 
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The write re-issue protocol is shown in Figure 12.16. Writes issue when 
WrRdy* is asserted both two cycles prior to the address cycle and during 
the address cycle. 



The pipelined write protocol is shown in Figure 12.17. This protocol 
maintains the R4000 write issue rule (issue if WrRdy* asserted two cycles 
prior to the address cycle, for one clock cycle), but simply eliminates the 
two null cycles between writes. The external agent is then required to 
accept one more write after it deasserts WrRdy*. 



All three write protocols apply for both single write and block writes. 
This means that in pipeline write, for example, a single write can be 
followed immediately by a block write that the external agent must 
accept. 

Processor block write requests are issued with the system interface in 
master state, as described below; a processor noncoherent block request 
for eight words of data is illustrated in Figure 12. 18 on page 12-22. 

1. The processor issues a write command on the SysCmd bus and a 
write address on the SysAD bus 

2. The processor asserts VaUdOut*. 

3. The processor drives a data identifier on the SysCmd bus and data 
on the SysAD bus. 

4. The processor asserts ValidOut* for a number of cycles sufficient to 
transmit the block of data. 

5. The data identifier associated with the last data cycle must contain a 
last data cycle indication. 


12-21 













System Interface 


Chapter 12 


Figure 12.18 illustrate a processor noncoherent block request for eight 
words of data with a data pattern of DDDD. 



Processor Request and Flow Control 

The external agent uses RdRdy* to control the flow of processor read 
requests. Figure 12.19 on page 12-23 Illustrates this flow control, as 
described In the steps below. 

1. The processor samples the signal RdRdy* to determine if the external 
agent is capable of accepting a read request. 

2. The signal WrRdy* controls the flow of a processor write request. 

3. The processor does not complete the issue of a read request, until it 
issues an address cycle in response to the request for which the signal 
RdRdy* was asserted two cycles earlier. 

4. The processor does not complete the issue of a write request until it 
issues an address cycle in response to the write request for which the 
signal WrRdy* was asserted two cycles earlier. 
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Figure 12.19 illustrates two processor write requests in which the issue 
of the second is delayed for the assertion of WrRdy*. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 



Figure 12. 19 Two Processor Write Requests, Second Write Delayed for the Assertion of 

WrRdy* 


External Request Protocols 

External requests can only be issued with the system interface in slave 
state. An external agent asserts ExtRqst* to arbitrate (see “External 
Arbitration Protocol” on page 12-24) for the system interface, then waits 
for the processor to release the system interface to slave state by 
asserting Release* before the external agent issues an external request. 
If the system interface is already in slave state — that is, the processor has 
previously performed an uncompelled change to slave state — the external 
agent can begin an external request immediately. 

After issuing an external request, the external agent must return the 
system interface to master state. If the external agent does not have any 
additional external requests to perform, ExtRqst* must be deasserted 
two cycles after the cycle in which Release* was asserted. For a string of 
external requests, the ExtRqst* signal is asserted until the last request 
cycle, whereupon it is deasserted two cycles after the cycle in which 
Release* was asserted. 

The processor continues to handle external requests as long as 
ExtRqst* is asserted: however, the processor cannot release the system 
interface to slave state for a subsequent external request until it has 
completed the current request. As long as ExtRqst* is asserted, the 
string of external requests is not interrupted by a processor request. 

This section describes the following external request protocols: 

• read 

• null 

• write 

• read response 
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External Arbitration Protocol 

System interface arbitration uses the signals ExtRqst* and Release* as 
described above. Figure 12.20 is a timing diagram of the arbitration 
protocol, in which slave and master states are shown. 

The arbitration cycle consists of the following steps: 

1. The external agent asserts ExtRqst* when it wishes to submit an 
external request. 

2. The processor waits until it is ready to handle an external request, 
whereupon it asserts Release* for one cycle. 

3. The processor sets the SysAD and SysCmd buses to tri-state. 

4. The external agent must begin driving the SysAD bus and the 
SysCmd bus two cycles after the assertion of Release*. 

5. The external agent deasserts ExtRqst* two cycles after the assertion 
of Release*, unless the external agent wishes to perform an additional 
external request. 

6. The external agent sets the SysAD and the SysCmd buses to tri-state 
at the completion of an external request. 

The processor can start issuing a processor request one cycle after the 
external agent sets the bus to tri-state. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 



External Read Request Protocol 

External reads are requests for a word of data from a processor internal 
resource, such as a register. External read requests cannot be split; that 
is, no other request can occur between the external read request and its 
read response. 
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Figure 12.21 shows a timing diagram of an external read request, which 
consists of the following steps: 

1. An external agent asserts ExtRqst* to arbitrate for the system 
interface. 

2. The processor releases the system interface to slave state by asserting 
Release* for one cycle and then deasserting Release*. 

3. After Release* is deasserted, the SysAD and SysCmd buses are set 
to a tri-state for one cycle. 

4. The external agent drives a read request command on the SysCmd 
bus and a read request address on the SysAD bus and asserts Validln* for 
one cycle. 

5. After the address and command are sent, the external agent releases 
the SysCmd and SysAD buses by setting them to tri-state and allowing the 
processor to drive them. The processor, having accessed the data that is 
the target of the read, returns this data to the external agent. The 
processor accomplishes this by driving a data identifier on the SysCmd 
bus, the response data on the SysAD bus, and asserting ValidOut* for one 
cycle. The data identifier indicates that this is last-data-cycle response 
data. 

6. The system interface is in master state. The processor continues 
driving the SysCmd and SysAD buses after the read response is returned. 

Note: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 

External read requests are only allowed to read a word of data from the 
processor. The processor response to external read requests for any data 
element other than a word is undefined. 



External Null Request Protocol 

The R4600/R4700 only supports one external null request. A system 
interface release external null request returns the system interface to 
master state from slave state without otherwise affecting the processor. 
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External null requests require no action from the processor other than 
to return the system Interface to master state. 

Figure 12.22 show timing diagram of the external null request cycle, 
which consist of the following steps: 

1. The external agent asserts ExtRqst* to arbitrate for the system 
interface. 

2. The processor releases the system interface to slave state by asserting 

Release*. 

3. The external agent drives a system interface release external null 
request command on the SysCmd bus, and asserts Validln* for one cycle 
to return the system interface back to master state. 

4. The SysAD bus is unused (does not contain valid data) during the 
address cycle associated with an external null request. 

5. After the address cycle is issued, the null request is complete. 

For a system interface release external null request, the external agent 
releases the SysCmd and SysAD buses, and expects the system interface 
to return to master state. 



External Write Request Protocol 

External write requests use a protocol identical to the processor single 
word write protocol except the Validln* signal is asserted instead of 
ValidOut*. Figure 12.23 on page 12-27 shows a timing diagram of an 
external write request, which consists of the following steps: 

1. The external agent asserts ExtRqst* to arbitrate for the system 
interface. 

2. The processor releases the system interface to slave state by asserting 

Release*. 

3. The external agent drives a write command on the SysCmd bus, a 
write address on the SysAD bus, and asserts Validln*. 

4. The external agent drives a data identifier on the SysCmd bus, data 
on the SysAD bus, and asserts Validln*. 

5. The data identifier associated with the data cycle must contain a 
coherent or noncoherent last data cycle indication. 

6. After the data cycle is issued, the write request is complete and the 
external agent sets the SysCmd and SysAD buses to a tri-state, allowing 
the system interface to return to master state. Timings for the SysADC 
and SysCmdP buses are the same as those of the SysAD and SysCmd 
buses, respectively. 
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External write requests are only allowed to write a word of data to the 
processor. Processor behavior in response to an external write request for 
any data element other than a word is undefined. 



Read Response Protocol 

An external agent must return data to the processor in response to a 
processor read request by using a read response protocol. A read 
response protocol consists of the following steps: 

1. The external agent waits for the processor to perform an uncompelled 
change to slave state. 

2. The external agent returns the data through a single data cycle or a 
series of data cycles. 

3. After the last data cycle is issued, the read response is complete and 
the external agent sets the SysCmd and SysAD buses to a tri-state. 

4. The system interface returns to master state. 

Note: The processor always performs an uncompelled change to slave 
state in the same cycle that it issues a read request. 

5. The data identifier for data cycles must indicate the fact that this data 
is response data. 

6. The data identifier associated with the last data cycle must contain a 
last data cycle indication. 

For read responses to non-coherent block read requests (which is the 
only read request for normal operations of the R4600/R4700,) the 
response data will not need to identify an initial cache state. The cache 
state will automatically be assigned as dirty exclusive by the R4600/ 
R4700. 

The data identifier associated with a data cycle can indicate that the 
data transmitted during that cycle is erroneous: however, an external 
agent must return a data block of the correct size regardless of the fact 
that the data may be in error. The R4600/R4700 only checks the error bit 
for the first doubleword of a block, the other error bits for the block of 
data are ignored If an initial erroneous data cycle is detected, the 
processor takes a bus error at the completion of the data transfer. 
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Read response data must only be delivered to the processor when a 
processor read request is pending. The behavior of the processor is unde- 
fined when a read response is presented to it and there is no processor 
read pending. 

Figure 12.24 illustrates a processor word read request followed by a 
word read response. Figure 12.25 illustrates a read response for a 
processor block read with the system interface already in slave state. 
Figure 12.26 illustrates a block read transaction with one wait state. 

Note: Timings for the Sy&ADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 



SCycle 

SCIock 

SysAD Bus 

SysCmd Bus 

ValidOut* 

Validln* 

ExtRqst* 

Release* 

RdRdy* 
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Data Rate Control 

The system interface supports a maximum data rate of one doubleword 
per cycle. The data rate the processor can support is directly related to 
the rate at which the external agent can accept data. 

Read Data Pattern 

The rate at which data is delivered to the processor can be determined 
by the external agent — for example, the external agent can drive data and 
assert Validln* every n cycles, instead of every cycle. An external agent 
can deliver data at any rate it chooses, but must not deliver data to the 
processor any faster than the processor is capable of receiving it. 

The processor only accepts cycles as valid when Validln* is asserted 
and the SysCmd bus contains a data identifier. If the external agent 
sends more data items then requested (e.g., a fifth doubleword of read 
response data with Validln* asserted) or the last data (i.e., the fourth 
doubleword) of a block read is not tagged as the last data item, it is an 
error and the resulting actions of the processor for these cases will be 
undefined. 
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Figure 12.27 shows a read response with reduced data rate and with 
the system interface in slave state. 



Write Data Transfer Patterns 

The write data pattern specifies the pattern the R4600/R4700 uses 
when writing a block to the external agent. This pattern is specified 
through the mode bits. 

A data pattern is a sequence of letters indicating the data and unused 
cycles that repeat to provide the appropriate data rate. For example, the 
data pattern DDxx specifies a repeatable data rate of two doublewords 
eveiy four cycles, with the last two cycles unused. 

Table 12.4 lists the maximum processor data rate and the data pattern 
for each data rate. 


Maximum Data Transmit Rate Block writes 

Data Pattern 

1 Double/l SClock Cycle 

DDDD 

2 Doubles /3 SClock Cycles 

DDxDDx 

1 Double/2 SClock Cycles 

DDxxDDxx 

1 Double/2 SClock Cycles 

DxDxDxDx 

2 Doubles/5 SClock Cycles 

DDxxxDDxxx 

1 Double/3 SClock Cycles 

DDxxxxDDxxxx 

1 Double/3 SClock Cycles 

DxxDxxDxxDxx 

1 Double/ 4 SQock Cycles 

DDxxxxxxDDxxxxxx 

1 Double/4 SClock Cycles 

DxxxDxxxDxxxDxxx 


Table 12.4 Transmit Data Rates and Patterns 


In Table 12.4 data patterns are specified using the letters D and x; D 
indicates a data cycle and x indicates an unused cycle. During the 
unused cycles, the data bus will maintain the last data value (D). 
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Independent Transmissions on the SysAD Bus 

In most applications, the SysAD bus is a point-to-point connection, 
running from the processor to a bidirectional registered transceiver 
residing in an external agent. For these applications, the SysAD bus has 
only two possible drivers, the processor or the external agent. 

Certain applications may require connection of additional drivers and 
receivers to the SysAD bus, to allow transmissions over the SysAD bus 
that the processor is not involved in. These are called independent trans- 
missions. To effect an independent transmission, the external agent must 
coordinate control of the SysAD bus by using arbitration handshake 
signals and external null requests. 

An independent transmission on the SysAD bus follows this procedure: 

1 . The external agent requests mastership of the SysAD bus, to issue an 
external request. 

2. The processor releases the system interface to slave state. 

3. The external agent then allows the independent transmission to take 
place on the SysAD bus, making sure that Validln* is not asserted while 
the transmission is occurring. 

4. When the transmission is complete, the external agent must issue a 
system interface release external null request to return the system interface 
to master state. 

System Interface Endianness 

The endianness of the system interface is programmed at boot time 
through the boot-time mode control interface (see chapter 9, Initialization 
Interface), and remains fixed until the next time the processor boot-time 
mode bits are read. Software cannot change the endianness of the system 
interface and the external system: software can set the reverse endian bit 
to reverse the interpretation of endianness inside the processor, but the 
endianness of the system interface remains unchanged. 

System Interface Cycle Time 

The processor specifies minimum and maximum cycle counts for 
various processor transactions and for the processor response time to 
external requests. Processor requests themselves are constrained by the 
system interface request protocol, and request cycle counts can be deter- 
mined by examining the protocol. The following system interface interac- 
tions can vary within minimum and maximum cycle counts: 

• waiting period for the processor to release the system interface to 
slave state in response to an external request ( release latency) 

• response time for an external request that requires a response ( exter- 
nal response latency). 

The remainder of this section describes and tabulates the minimum and 
maximum cycle counts for these system interface interactions. 
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Release Latency 

Release latency is generally defined as the number of cycles the 
processor can wait to release the system interface to slave state for an 
external request. When no processor requests are in progress, internal 
activity can cause the processor to wait some number of cycles before 
releasing the system interface. Release latency is therefore more specifi- 
cally defined as the number of cycles that occur between the assertion of 
ExtRqst* and the assertion of Release*. 

There are three categories of release latency: 

• Category 1: when the external request signal is asserted two cycles 
before the last cycle of a processor request. 

• Category 2: when the external request signal is not asserted during a 
processor request, or is asserted during the last cycle of a processor 
request. 

• Category 3: when the processor makes an uncompelled change to 
slave state. 

Table 12.5 summarizes the minimum and maximum release latencies 
for requests that fall into categories 1, 2 and 3. Note that the maximum 
and minimum cycle count values are subject to change. 


Category 

Minimum PCycles 

Maximum PCycles 

i 

4 

6 

2 

4 

24 

3 

0 

0 


Table 12.5 Release Latency for External Requests 


The differences in the minimum and maximum times are due to 
internal conditions not readily observable externally. 

System Interface Commands and Data Identifiers 

System interface commands specify the nature and attributes of any 
system interface request: this specification is made during the address 
cycle for the request. System interface data identifiers specify the 
attributes of data transmitted during a system interface data cycle. 

The following sections describe the syntax, that is, the bitwise encoding 
of system interface commands and data identifiers. 

Reserved bits and reserved fields in the command or data identifier 
should be set to 1 for system interface commands and data identifiers 
associated with external requests. For system interface commands and 
data identifiers associated with processor requests, reserved bits and 
reserved fields in the command and data identifier are undefined. 

Command and Data Identifier Syntax 

System interface commands and data identifiers are encoded in 9 bits 
and are transmitted on the SysCmd bus from the processor to an 
external agent, or from an external agent to the processor, during address 
and data cycles. Bit 8 (the most-significant bit) of the SysCmd bus deter- 
mines whether the current content of the SysCmd bus is a command or a 
data identifier and, therefore, whether the current cycle is an address 
cycle or a data cycle. For system interface commands, SysCmd(8) must 
be set to 0. For system interface data identifiers, SysCmd(8) must be set 
to 1. 
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System Interface Command Syntax 

This section describes the SysCmd bus encoding for system interface 
commands. Figure 12.28 shows a common encoding used for all system 
interface commands. 


8 7 


5 4 


Request Type 


Request Specific 


Figure 12.28 System Interface Command Syntax Bit Definition 


SysCmd(8) must be set to 0 for all system interface commands. 
SysCmd(7:5) specify the system interface request type which may be 
read, write or null; Table 12.6 lists the encoding of SysCmd(7:5). 

Table 12.6 shows the types of requests encoded by the SysCmd(7:5) 
bits. 


SysCmd(7:5) 

Command 

0 

Read Request 

1 

Reserved 

2 

Write Request 

3 

Null Request 

4 - 7 

Reserved 


Table 12.6 Encoding of SysCmd(7:5) for System Interface Commands 


SysCmd(4:0) are specific to each type of request and are defined in 
each of the following sections. 

Read Requests 

Figure 12.29 shows the format of a SysCmd read request. 


8 7 5 4 3 2 1 0 



000 

Read Request Specific 



(see tables) 

i i 


Figure 12.29 Read Request SysCmd Bus Bit Definition 
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Table 12.7, Table 12.8, and Table 12.9 list the encoding of SysCmd(4:0) 
for read requests. 


SysCmd(4:3) 

Read Attributes 

0- 1 

Reserved 

2 

Noncoherent block read 

3 

Doubleword, partial doubleword, word, or partial word 


Table 12.7 Encoding of SysCmd(4:3) for Read Requests 


SysCmd(2) 

Link Address Retained Indication 

0 

Link address not retained 

1 

Link address retained 

SysCmd(l:0) 

Read Block Size 

0 

Reserved 

1 

8 words 

2-3 

Reserved 


Table 12.8 Encoding of SysCmd(2:0) for Block Read Request 


SysCmd(2:0) 

Read Data Size 

0 

1 byte valid (Byte) 

1 

2 bytes valid (Halfword) 

2 

3 bytes valid (Tribyte) 

3 

4 bytes valid (Word) 

4 

5 bytes valid (Quintibyte) 

5 

6 bytes valid (Sextibyte) 

6 

7 bytes valid (Septibyte) 

7 

8 bytes valid (Doubleword) 


Table 12.9 Doubleword, Word, or Partial-word Read Request Data Size 
Encoding of SysCmd(2:0) 

Write Requests 

Figure 12.30 shows the format of a SysCmd write request. 
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Table 12.10 lists the write attributes encoded in bits SysCmd(4:3). 
Table 12.11 lists the block write replacement attributes encoded in bits 
SysCmd(2:0). Table 12.12 lists the write request bit encoding in 

SysCmd(2:0). 


SysCmd(4:3) 

Write Attributes 

0 

Reserved 

1 

Reserved 

2 

Block write 

3 

Doubleword, partial doubleword, word, or partial word 


Table 12.10 Write Request Encoding of SysCmd(4:3) 


SysCmd(2) 

Cache Line Replacement Attributes 

0 

Cache line replaced 

1 

Cache line retained 

SysCmd(l:0) 

Write Block Size 

0 

Reserved 

1 

8 words 

2-3 

Reserved 


Table 12.11 Block Write Request Encoding of SysCmd(2:0) 


SysCmd(2:0) 

Write Data Size 

0 

1 byte valid (Byte) 

1 

2 bytes valid (Halfword) 

2 

3 bytes valid (Tribyte) 

3 

4 bytes valid (Word) 

4 

5 bytes valid (Quintibyte) 

5 

6 bytes valid (Sextibyte) 

6 

7 bytes valid (Septibyte) 

7 

8 bytes valid (Doubleword) 


Table 12. 12 Doubleword, Word, or Partial-word Write Request Data Size 
Encoding of SysCmd(2:0) 
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Null Requests 

Figure 12.31 shows the format of a SysCmd null request. 


8 7 5 4 3 2 1 0 


0 

Oil 

Null Request Specific 

1 



(see table) 



Figure 12.31 Null Request SysCmd Bus Bit Definition 

System interface release external null requests use the null request 
command. Table 12.13 lists the encoding of SysCmd(4:3) for external 
null requests. SysCmd(2:0) are reserved for both instances of null 
requests. 


SysCmd(4:3) 

Null Attributes 

0 

System Interface release 

1 -3 

Reserved 


Table 12.13 External Null Request Encoding of SysCmd(4:3) 


System Interface Data Identifier Syntax 

This section defines the encoding of the SysCmd bus for system inter- 
face data identifiers. Figure 12.32 shows a common encoding scheme 
used for all system interface data identifiers. 


8 

7 

6 

5 

4 

3 2 0 

1 

Last 

Data 

Resp 

Data 

Good 

Data 

Data 

Check 

Reserved 



Figure 12.32 Data Identifier SysCmd Bus Bit Definition 

SysCmd(8) must be set to 1 for all system interface data identifiers. 

system interface data identifiers use the format for noncoherent data. 

Noncoherent Data 

Noncoherent data is defined as follows: 

• data that is associated with processor block write requests and pro- 
cessor doubleword, partial doubleword, word, or partial word write re- 
quests 

• data that is returned in response to a processor noncoherent block 
read request or a processor doubleword, partial doubleword, word, or 
partial word read request 

• data that is associated with external write requests 

• data that is returned in response to an external read request 
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Data Identifier Bit Definitions 

SysCmd(7) marks the last data element and SysCmd(6) Indicates 
whether or not the data Is response data, for both processor and external 
coherent and noncoherent data Identifiers. Response data is data 
returned in response to a read request. 

SysCmd(5) indicates whether or not the data element is error free. Erro- 
neous data contains an uncorrectable error and is returned to the 
processor, forcing a bus error. The processor delivers data with the good 
data bit deasserted if a primary parity error is detected for a transmitted 
data item. 

SysCmd(4) indicates to the processor whether to check the data and 
check bits for this data element. 

SysCmd(3) is reserved for external data identifiers. 

SysCmd(4:3) are reserved for noncoherent processor data identifiers. 

SysCmd(2:0) are reserved for noncoherent data identifiers. 

Table 12.14 lists the encoding of SysCmd(7:3) for processor data identi- 
fiers. 


SysCmd(7) 

Last Data Element Indication 

0 

Last data element 

1 

Not the last data element 

SysCmd(6) 

Response Data Indication 

0 

Data Is response data 

1 

Data Is not response data 

SysCmd(5) 

Good Data Indication 

0 

Data is error free 

1 

Data Is erroneous 

SysCmd(4:3) 

Reserved 


Table 12.14 Processor Data Identifier Encoding of SysCmd(7:3) 
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Table 12.15 lists the encoding of SysCmd(7:3) for external data identi- 
fiers. 


SysCmd(7) 

Last Data Element Indication 

0 

Last data element 

1 

Not the last data element 

SysCmd(6) 

Response Data Indication 

0 

Data is response data 

1 

Data Is not response data 

SysCmdf5j 

Good Data Indication 

0 

Data is error free 

1 

Data is erroneous 

SysCmd(4) 

Data Checking Enable 

0 

Check the data and check bits 

1 

Do not check the data and check bits 

SysCmd(3) 

Reserved 


Table 12.15 External Data Identifier Encoding of SysCmd(7:3) 


System Interface Addresses 

System interface addresses are full 36-bit physical addresses presented 
on the least-significant 36 bits (bits 35 through 0) of the SysAD bus 
during address cycles; the remaining bits of the SysAD bus are unused 
during address cycles. 

Addressing Conventions 

Addresses associated with doubleword, partial doubleword, word, or 
partial word transactions, are aligned for the size of the data element. 
The system uses the following address conventions: 

• Addresses associated with block requests are aligned to double-word 
boundaries; that is, the low-order 3 bits of address are 0. 

• Doubleword requests set the low-order 3 bits of address to 0. 

• Word requests set the low-order 2 bits of address to 0. 

• Halfword requests set the low-order bit of address to 0. 

• Byte, tribyte, quintibyte, sextibyte, and septibyte requests use the 
byte address. 

Subblock Ordering 

The order in which data is returned in response to a processor block 
read request is subblock ordering. In subblock ordering, the processor 
delivers the address of the requested doubleword within the block. An 
external agent must return the block of data using subblock ordering, 
starting with the addressed doubleword. 

A block of data elements (whether bytes, halfwords, words, or double- 
words) can be retrieved from storage in two ways: in sequential order, or 
using a subblock order. This section describes these retrieval methods, 
with an emphasis on subblock ordering. Note that the R4600/R4700 only 
uses subblock ordering for block reads. 
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Example of Sequential Ordering 

Sequential ordering retrieves the data elements of a block in serial, or 
sequential, order. 

Figure 12.33 shows a sequential order in which DWO is taken first and 
DW3 is taken last. 


DWO 

DW1 

DW2 

&m 



DW1 


taken second 0^2 

taken third 


Figure 12.33 Retrieving a Data Block in Sequential Order 


Example of Subblock Ordering 

Subblock ordering allows the system to define the order in which the 
data elements are retrieved. The smallest data element of a block transfer 
for the R4600/R4700 is a doubleword, and Figure 12.34 shows the 
retrieval of a block of data that consists of 4 doublewords (the cache line 
size is 8 words), in which DW2 is taken first. 



Using the subblock ordering shown in Figure 12.34, the doubleword at 
the target address is retrieved first (DW2), followed by the remaining 
doubleword (DW3) in this quadword. Next, the quadword that fills out the 
octalword are retrieved in the same order as the prior quadword (in this 
case DWO is followed by DW1). 
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It may be easier way to understand subblock ordering by taking a look 
at the method used for generating the address of each doubleword as it is 
retrieved. The subblock ordering logic generates this address by 
executing a bit-wise exclusive-OR (XOR) of the starting block address with 
the output of a binary counter that increments with each doubleword, 
starting at doubleword zero (OO 2 ). 

Using this scheme. Table 12.16, Table 12.17, and Table 12.18 list the 
subblock ordering of doublewords for an 8-word block, based on three 
different starting-block addresses: IO 2 , ll 2 > and OI 2 . The subblock 
ordering is generated by an XOR of the subblock address (either IO 2 , 1 12 . 
or 01 2 ) with the binary count of the doubleword (OO 2 through 1 12 ). Thus, 
the third doubleword retrieved from a block of data with a starting 
address of 10 2 is found by taking the XOR of address 10 2 with the binary 
count of DW2, IO 2 . The result is OO 2 , or DWO (shown in Table 12. 16). 


Cycle 

Starting Block 
Address 

Binary Count 

Double Word 
Retrieved 

1 

10 

00 

10 

2 

10 

01 

11 

3 

10 

10 

00 

4 

10 

11 

01 


Table 12.16 Sequence of Doublewords Transferred Using Subblock 
Ordering: Address 10 2 


Cycle 

Starting Block 
Address 

Binary Count 

Double Word 
Retrieved 

1 

11 

00 

11 

2 

11 

01 

10 

3 

11 

10 

01 

4 

11 

11 

00 


Table 12.17 Sequence of Doublewords Transferred Using Subblock 
Ordering: Address 1 1 2 


Cycle 

Starting Block 
Address 

Binary Count 

Double Word 
Retrieved 

1 

01 

00 

01 

2 

01 

01 

00 

3 

01 

10 

11 

4 

01 

11 

10 


Table 12. 18 Sequence of Doublewords Transferred Using Subblock 
Ordering: Address 01 2 


For block write requests, the processor always delivers the address of 
the doubleword at the beginning of the block; the processor delivers data 
beginning with the doubleword at the beginning of the block and 
progresses sequentially through the doublewords that form the block. 
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During data cycles, the valid byte lines depend upon the position of the 
data with respect to the aligned doubleword (this may be a byte, halfword, 
tribyte, quadbyte/word, quintibyte, sextibyte, septibyte, or an octalbyte/ 
doubleword). For example, in little-endian mode, on a byte request where 
the address modulo 8 is 0, SysAD(7:0) are valid during the data cycles. 

Table 12.19 shows the byte lanes used for partial word transfers for 
both little and big endian. 
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Processor Internal Address Map 

External reads and writes provide access to processor internal 
resources that may be of interest to an external agent. The processor 
decodes bits SysAD(6:0) of the address associated with an external read 
or write request to determine which processor internal resource is the 
target. 

However, the R4600/R4700 does not contain any resources that are 
readable through an external read request. Therefore, in response to an 
external read request the processor returns undefined data and a data 
identifier with its Erroneous Data bit, SysCmd(5), set. 

The Interrupt register is the only processor internal resource available 
for write access by an external request. The Interrupt register is accessed 
by an external write request with an address of 000 2 on bits 6:4 of the 
SysAD bus. 

The interrupt register is described in detail in Chapter 13, 
“R4600/R4700 Processor Interrupts.” 
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Interrupts 



Introduction 

The R4600/R4700 processor supports the following Interrupts: six 
hardware interrupts, one internal “timer interrupt,” two software 
interrupts, and one nonmaskable interrupt. The processor takes an 
exception on any interrupt. 

This chapter describes the six hardware and single nonmaskable 
interrupts. A description of the software and the timer interrupts can be 
found in Chapter 5. CPU exception processing is also described in Chapter 
5. Floating-point exception processing is described in Chapter 6. 

Hardware Interrupts 

The six CPU hardware interrupts can be caused by external write 
requests to the R4600/R4700, or can be caused through dedicated 
interrupt pins. These pins are latched into an internal register by the rising 
edge of SClock. 

Nonmaskable Interrupt (NMI) 

The nonmaskable interrupt is caused either by an external write request 
to the R4600/R4700 or by a dedicated pin in the R4600/R4700. This pin 
is latched into an internal register by the rising edge of SClock. 

Asserting Interrupts 

External writes to the CPU are directed to various internal resources, 
based on an internal address map of the processor. When SysAD[6:0] = 0 
during an ADDR cycle of external write request, an external write to any 
address writes to an architecturally transparent register called the 
Interrupt register; this register is available for external write cycles, but not 
for external reads. 

During a data cycle, SysAD[22:16] are the write enables for the seven 
individual Interrupt register bits (0 = disabled, 1 = enabled) and SysAD[6:0] 
are the values to be written into these bits (0 = no interrupt, 1 = interrupt). 
This allows any subset of the Interrupt register to be set or cleared with a 
single write request. Figure 13.1 shows the mechanics of an external write 
to the Interrupt register. 


SysAD(6:0) Interrupt Value 


n 

IB 

B 

B 

B 

B 

a 





B 

mu 

B 

B 

B 

D 


SysAD(22:16) Write Enables 


( 


1 

j 

I 

B 

I 

9 

I 

B 

1 

9 

l 

□ 


Interrupt register 


See Figure 13.2 
and Figure 13.3. 


Figure 13.1 Interrupt Register Bits and Enables 
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Figure 13.2 shows how the R4600/R4700 Interrupts are readable 
through the Cause register. The interrupt bits, Int*(5:0), are latched into 
the internal register by the rising edge of SCIock. 

• Bit 5 of the Interrupt register in the R4600/R4700 is ORed with the 
Int*(5) pin and then multiplexed with the internal Timerlnterrupt 
signal. This result is directly readable as bit 15 of the Cause register. 

• Bits 4:0 of the Interrupt register are bit-wise ORed with the current 
value of the interrupt pins Int*[4:0] and the result is directly readable 
as bits 14:10 of the Cause register. 


SCIock - 



See 

Figure 13.4 


Figure 13.2 R4600/R4700 Interrupt Signals 


Figure 13.3 shows the internal derivation of the NMI signal, for the 
R4600/R4700 processor. 

The NMI* pin is latched into an internal register by the rising edge of 
SCIock. Bit 6 of the Interrupt register is then ORed with the inverted value 
of NMI* to form the nonmaskable interrupt. Only the one falling edge of the 
latched signed will cause the NMI. 



Figure 13.3 R4600/R4700 Nonmaskable Interrupt Signal 
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Figure 13.4 shows the masking of the R4600/R4700 interrupt signal. 

• Cause register bits 15:8 (IP7-IP0) are AND-ORed with Status register 
interrupt mask bits 15:8 (IM7-IM0) to mask individual interrupts. 

• Status register bit 0 is a global Interrupt Enable (IE). It is ANDed with 
the output of the AND-OR logic to produce the R4600/R4700 inter- 
rupt signal. 
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Introduction 

This chapter describes the Error Checking mechanism used in the 
R4600/R4700 processor. 

Error Checking in the Processor 

Error checking codes allow the processor to detect and sometimes 
correct errors made when moving data from one place to another. 

Two major types of data errors can occur in data transmission: 

• hard errors, which are permanent, arise from broken interconnects, 
internal shorts, or open leads 

• soft errors, which are transient, are caused by system noise, power 
surges, and alpha particles. 

Hard errors must be corrected by physical repair of the damaged 
equipment and restoration of data from backup. Soft errors can be 
corrected by using error checking and correcting codes. 

Types of Error Checking 

The R4600/R4700 uses parity (error detection only). 

Parity Error Detection 

Parity is the simplest error detection scheme. By appending a bit to the 
end of an item of data — called a parity bit — single bit errors can be 
detected; however, these errors cannot be corrected. 

There are two types of parity: 

• Odd Parity adds 1 to any even number of Is in the data, making the 
total number of Is odd (including the parity bit). 

• Even Parity adds 1 to any odd number of Is in the data, making the 
total number of Is even (including the parity bit). 

Odd and even parity are shown in the example below: 

Dataf3:Q) Odd Parity Bit Even Parity Bit 

0 0 1 0 0 1 

The example above shows a single bit in Data(3:0) with a value of 1; this 
bit is Data(l). 

• In even parity, the parity bit is set to 1. This makes 2 (an even num- 
ber) the total number of bits with a value of 1. 

• Odd parity makes the parity bit a 0 to keep the total number of 1 -val- 
ue bits an odd number — in the case shown above, the single bit Da- 

tad). 

The example below shows odd and even parity bits for various data 
values: 


Dataf3:0) 

0 110 
0 0 0 0 
1111 
110 1 


Odd Parity Bit 

1 

1 

1 

0 


Even Parity Bit 

0 

0 

0 

1 


Parity allows single-bit error detection, but it does not indicate which bit 
is in error — for example, suppose an odd-parity value of 00011 arrives. 
The last bit is the parity bit, and since odd parity demands an odd number 
(1,3,5) of Is, this data is in error: it has an even number of Is. However it 
is impossible to tell which bit is in error. 
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Error Checking Operation 

The processor verifies data correctness by using parity as it passes data 
from/to the system interface to/from the primary caches. 

System Interface 

The processor generates correct check bits for doubleword, word, or 
partial-word data transmitted to the system interface. As it checks for data 
correctness, the processor passes data check bits from the primary cache, 
directly without changing the bits, to the system interface. 

The processor does not check data received from the system interface for 
external writes. By setting the NChckbit in the data identifier, it is possible 
to prevent the processor from checking read response data from the 
system interface. 

For cache refill, if the NChck bit is set, the CPU will generally correct 
parity before placing data into the cache. The R4600/R4700 only checks 
parity for the first double word returned on a block instruction fetch, that 
is, for the double word that contains the instruction that was missed on 
in the cache. This double word is checked just as if it had been read out 
of the ICache. This parity check is done as a byte parity check. For single 
read, and with the NChck bit set, the CPU will check parity for all 64-bit, 
even if the transfer size is less than that. 

When the R4600/R4700 is checking parity it does not actually 
regenerate the word parity, but rather turns the byte parity supplied by the 
system into word parity. It XORS the bits in groups of four. As a result, if 
bad byte parity is supplied by the system, bad word parity will get written 
into the cache. This is done to be consistent with what happens in the 
DCache. 

The processor does not check addresses received from the system 
interface and does not generate correct check bits for addresses 
transmitted to the system interface. 

The processor does not contain a data corrector; instead, the processor 
takes a cache error exception when it detects an error based on data check 
bits. Software is responsible for error handling. 

System Interface Command Bus 

In the R4600/R4700 processor, the system interface command bus has 
no parity. SysCmdP always drives zero out for CPU valid cycles and is not 
checked when the system interface is in slave state. 
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Summary of Error Checking Operations 

Error Checking operations are summarized in Table 14.1 and 
Table 14.2. 


Bus 

Uncached 

Load 

Uncached 

Store 

Primary Cache 
Load from System 
Interface 

Primary Cache 
Write to System 
Interface 

Cache 

Instruction 

Processor Data 

From System 
Interface 

Not 

Checked 

From System Inter- 
face unchanged 

Checked; Trap 
on Error 

Check on 
cache write- 
back; Trap on 
Error 

System Interface 
Address /Com- 
mand and Check 
Bits: Transmit 

Not 

Generated 

Not 

Generated 

Not Generated 

Not Generated 

Not Generated 

System Interface 
Address /Com- 
mand and Check 
Bits: Receive 

Not Checked 

NA 

Not Checked 

NA 

NA 

System Interface 
Data 

Checked; 

Trap on Error 

From Pro- 
cessor 

Checked; Trap on 
Error 

From Primary 
Cache 

From Primary 
Cache 

System Interface 
Data Check Bits 

Checked; 

Trap on Error 

Generated 

Checked; Trap on 
Error 

From Primary 
Cache 

From Primary 
Cache 


Table 14.1 Error Checking and Correcting Summary for Internal Transactions 


Bus 

Read 

Request 

Write Request 

Processor Data 

NA 

NA 

System Interface Address, Command, and Check Bits: Trans- 
mit 

Generated 

NA 

System Interface Address, Command, and Check Bits: Receive 

Not Checked 

Not Checked 

System Interface Data 

From Processor 

Checked; Trap on Error 

System Interface Data Check Bits 

Generated 

Checked; Trap on Error 


Table 14.2 Error Checking and Correcting Summary for External Transactions 
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Introduction 

This appendix provides a detailed description of the operation of each 
R4600/R4700 instruction. The instructions are listed in alphabetical 
order. 

Exceptions that may occur due to the execution of each instruction are 
listed after the description of each instruction. Descriptions of the 
immediate cause and manner of handling exceptions are omitted from the 
instruction descriptions in this appendix. 

Figures at the end of this appendix list the bit encoding for the constant 
fields of each instruction, and the bit encoding for each individual 
instruction is included with that instruction. 


Instruction Classes 

CPU instructions are divided into the following classes: 

• Load and Store instructions move data between memory and general 
registers. They are all I-type instructions, since the only addressing 
mode supported is base register + 16-bit immediate offset. 

• Computational instructions perform arithmetic, logical and shift op- 
erations on values in registers. They occur in both R-type (both 
operands are registers) and I-type (one operand is a 16-bit immediate) 
formats. 

• Jump and Branch instructions change the control flow of a program. 
Jumps are always made to absolute 26-bit word addresses (J-type 
format), or register addresses (R-type), for returns and dispatches. 
Branches have 16-bit offsets relative to the program counter (I-type). 
Jump and Link instructions save their return address in register 31. 

• Coprocessor instructions perform operations in the coprocessors. 
Coprocessor loads and stores are I-type. Coprocessor computational 
instructions have coprocessor-dependent formats (see the FPU in- 
structions in Appendix B). Coprocessor zero (CPO) instructions ma- 
nipulate the memory management and exception handling facilities of 
the processor. 

• Special instructions perform a variety of tasks, including movement 
of data between special and general registers, trap, and breakpoint. 
They are always R-type. 
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Instruction Formats 

Every CPU Instruction consists of a single word (32 bits) aligned on a 
word boundary and the major instruction formats are shown in Figure A. 1 . 


1-Type (Immediate) 

31 26 25 21 20 16 15 0 


op 

rs rt 

immediate || 

J-Type (Jump) 

31 26 

25 

0 


op 

target | 

R-Type (Register) 

31 26 25 21 20 16 15 1110 6 5 0 


op 

rs rt 

rd shamt funct || 

op 6-bit 

rs 5-bit « 

rt 5-bit 

immediate 16-bit 
displa 

target 26-bit 

rd 5-bit 

shamt 5-bit 

funct 6-bit 

operation code 

source register specifier 

:arget (source /destination) or branch condition 

immediate, branch displacement or address 
cement 

Jump target address 

destination register specifier 

shift amount 

function field 


Figure A. 1 CPU Instruction Formats 


Instruction Notation Conventions 

In this appendix, all variable subfields in an instruction format (such 
as rs, rt, immediate, etc.) are shown in lowercase names. 

For the sake of clarity, we sometimes use an alias for available subfield 
in the formats of specific instructions. For example, we use rs = base in 
the format for load and store instructions. Such an alias is always lower 
case, since it refers to a variable subfield. 

Figures with the actual bit encoding for all the mnemonics are located 
at the end of this Appendix, and the bit encoding also accompanies each 
instruction. 

In the instruction descriptions that follow, the Operation section 
describes the operation performed by each instruction using a high-level 
language notation. 
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Special symbols used in the notation are described in Table A. 1 


Symbol 

Meaning 

4 - 

Assignment. 

ii 

Bit string concatenation. 

Xy 

Replication of bit value x into a y-bit string. Note: x is always a single-bit 


Selection of bits y through z of bit string x. Little-endian bit notation is always 

HhhH 

used. If y is less than z, this expression is an empty (zero length) bit string. 

+ 

2’s complement or floating-point addition. 

- 

2’s complement or floating-point subtraction. 

* 

2’s complement or floating-point multiplication. 

div 

2’s complement integer division. 

mod 

2's complement modulo. 

/ 

Floating-point division. 

< 

2’s complement less than comparison. 

and 

Bit-wise logical AND. 

or 

Bit-wise logical OR. 

xor 

Bit-wise logical XOR. 

nor 

Bit-wise logical NOR. 

GPR[x] 

General-Register x. The content of GPR[0] is always zero. Attempts to alter the 
content of GPR[0] have no effect. 

CPR[z,x] 

Coprocessor unit z, general register x. 

CCR[z,x] 

Coprocessor unit z, control register x. 

COC[z] 

Coprocessor unit z condition signal. 

BigEndianMem 

Big-endian mode as configured at reset (0 — » Little, 1 -» Big). Specifies the endi- 
anness of the memory interface (see LoadMemory and StoreMemory), and the en- 
dianness of Kernel and Supervisor mode execution. 

ReverseEndian 

Signal to reverse the endianness of load and store instructions in User mode; 
effected by setting the ftEbit of the Status register. Thus, ReverseEndian may be 
computed as (SR 25 and User mode). 

BigEndianCPU 

The endianness for load and store instructions (0 -» Little, 1 — » Big). In User 
mode, this endianness may be reversed by setting SR 25 . Thus, BigEndianCPU 
may be computed as BigEndianMem XOR ReverseEndian. 

LLbit 

Bit of state to specify synchronization instructions. Set by LL, cleared by ERET and 
Invalidate and read by SC. 

T+i 

Indicates the time steps between operations. Each of the statements within a time 
step are defined to be executed in sequential order (as modified by conditional and 
loop constructs). Operations which are marked T+i: are executed at instruction cy- 
cle / relative to the start of execution of the instruction. Thus, an instruction which 
starts at time j executes operations marked T+/V at time 

/ + j. The interpretation of the order of execution between two instructions or two 
operations which execute at the same time should be pessimistic; the order is not 
defined. 


Table A.1 CPU Instruction Operation Notations 
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Instruction Notation Examples 

The following examples illustrate the application of some of the 
instruction notation conventions: 


Example #1: 

. ' / GPRlrtl ImiSi^dlate 1 I G* 6 • ' 

Sixteen zero bits are concatenated with an immediate value 
(typically 16 bits), and the 32-bit string (with the lower 16 buts 
set to zero) is assigned to General-Purpose Register rt. 

Example # 2 : 

(inunediate 1 i immediate i ) . . 

Bit 15 (the sign bit) of an immediate value is extended for 
16 bit positions, and the result is concatenated with bits 15 
through 0 of the immediate value to form a 32 -bit sign 
extended value. 


Load and Store Instructions 

In the R4600/R4700, as In the case of processors, the instruction 
immediately following a load may use the loaded contents of the register. 
In such cases, the hardware interlocks, requiring additional real cycles, so 
scheduling load delay slots is still desirable, although not required for 
functional code. 

Two special instructions are provided in the R4600/R4700 
implementation of the MIPS ISA, Load Linked and Store Conditional. 
These instructions are used in carefully coded sequences to provide one of 
several synchronization primitives, including test-and-set, bit-level locks, 
semaphores, and sequencers/event counts. 

In the load and store descriptions, the functions listed in Table A.2 are 
used to summarize the handling of virtual addresses and physical 
memory. 


Function 

Meaning 

AddressTranslation 

Uses the TLB to find the physical address given the virtual 
address. The function fails and an exception is taken if the 
required translation is not present in the TLB. 

LoadMemory 

Uses the cache and main memory to find the contents of 
the word containing the specified physical address. The 
low-order two bits of the address and the Access Type field 
indicates which of each of the four bytes within the data 
word need to be returned. If the cache is enabled for this 
access, the entire word is returned and loaded into the 
cache. 

StoreMemory 

Uses the cache, write buffer, and main memory to store 
the word or part of word specified as data in the word con- 
taining the specified physical address. The low-order two 
bits of the address and the Access Type field indicates 
which of each of the four bytes within the data word 
should be stored. 


Table A.2 Load and Store Common Functions 
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As shown in Table A. 2, the Access Type field indicates the size of the 
data item to be loaded or stored. Regardless of access type or byte- 
numbering order (endianness), the address specifies the byte which has 
the smallest byte address in the addressed field. For a big-endian 
machine, this is the leftmost byte and contains the sign for a 2’s 
complement number; for a little-endian machine, this is the rightmost 
byte. 


Access Type Mnemonic 

Value 

Meaning 

DOUBLEWORD 

7 

8 bytes (64 bits) 

SEPTIBYTE 

6 

7 bytes (56 bits) 

SEXTIBYTE 

5 

6 bytes (48 bits) 

QUINTIBYTE 

4 

5 bytes (40 bits) 

WORD 

3 

4 bytes (32 bits) 

TRIPLEBYTE 

2 

3 bytes (24 bits) 

HALFWORD 

1 

2 bytes (16 bits) 

BYTE 

0 

1 byte (8 bits) 


Table A3 Access Type Specifications for Loads/Stores 


The bytes within the addressed doubleword which are used can be 
determined directly from the access type and the three low-order bits of the 
address. 

Jump and Branch Instructions 

All jump and branch instructions have an architectural delay of exactly 
one instruction. That is, the instruction immediately following a jump or 
branch (that is, occupying the delay slot) is always executed while the 
target instruction is being fetched from storage. A delay slot may not itself 
be occupied by a jump or branch instruction; however, this error is not 
detected and the results of such an operation are undefined. 

If an exception or interrupt prevents the completion of a legal 
instruction during a delay slot, the hardware sets the EPC register to point 
at the jump or branch instruction that precedes it. When the code is 
restarted, both the jump or branch instructions and the instruction in the 
delay slot are reexecuted. 

Because jump and branch instructions may be restarted after 
exceptions or interrupts, they must be restartable. Therefore, when a 
jump or branch instruction stores a return link value, register 31 (the 
register in which the link is stored) may not be used as a source register. 

Since instructions must be word-aligned, a Jump Register or Jump 
and Link Register instruction must use a register whose two low-order 
bits are zero. If these low-order bits are not zero, an address exception will 
occur when the jump target instruction is subsequently fetched. 
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Coprocessor Instructions 

Coprocessors are alternate execution units, which have register files 
separate from the CPU. The R4600/R4700 architecture (MIPS III) provides 
three coprocessor units, or classes, and these coprocessors have two 
register spaces, each space containing thirty-two registers. These registers 
may be either 32-bits or 64-bits wide. 

• The first space, coprocessor general registers, may be directly loaded 
from memory and stored into memory, and their contents may be 
transferred between the coprocessor and processor. 

• The second space, coprocessor control registers, may only have their 
contents transferred directly between the coprocessor and the proces- 
sor. Coprocessor instructions may alter registers in either space. 

System Control Coprocessor (CPO) Instructions 

There are some special limitations imposed on operations involving 
CPO that is incorporated -within the CPU. The move to/from coprocessor 
instructions are the only valid mechanism for writing to and reading from 
the CPO registers. 

Several CPO instructions are defined to directly read, write, and probe 
TLB entries and to modify the operating modes in preparation for returning 
to User mode or interrupt-enabled states. 
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ADD Add ADD 


31 


26 25 


21 20 


16 15 


11 10 


6 5 


SPECIAL 

000000 

rs 

rt 

rd 

0 

00000 

ADD 

100000 

6 

5 

5 

5 

5 

6 


Format: 

ADD rd, rs, rt 

Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general register 
rd. The operands must be valid sign-extended, 32-bit values. 

An overflow exception occurs if the carries out of bits 30 and 31 differ 
(2’s complement overflow). The destination register rd is not modified when 
an integer overflow exception occurs. 

Operation: 

T : temp <- GPR[rs] + GPR[rt] 

GPR[rd] <- (temp 31 ) 32 II temp 31 0 


Exceptions: 

Integer overflow exception 
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ADDI Add Immediate ADDI 


31 26 

25 21 

20 16 

15 

0 

ADDI 

001 000 

rs 

rt 

immediate 

6 

5 

5 

16 



Format: 

ADDI rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. The rs operand must be valid sign-extended, 32-bit values. 

An overflow exception occurs if carries out of bits 30 and 31 differ (2’s 
complement overflow). The destination register rt is not modified when an 
integer overflow exception occurs. 

Operation: 

T: temp <- GPR[rs] + (immediate-^) 48 I I immediate 15--0 

GPR[rt] <- (temp 31 ) 32 II temp 31i 0 


Exceptions: 

Integer overflow exception 
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ADDIU Add Immediate Unsigned ADDIU 


31 26 

25 21 

20 16 

15 

0 

ADDIU 

001001 

rs 

rt 

immediate 

6 

5 

5 

16 



Format: 

ADDIU rt, rs, immediate 

Description: 

The 16-hit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. No integer overflow exception occurs under any circumstances. 
The rs operand must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the ADD1 instruction 
is that ADDIU never causes an overflow exception. 

Operation: 


T: temp <- GPR[rs] + (immediate 15 ) 48 I I immediate 150 

GPR[rt] <- (temp 31 ) 32 II temp 310 


Exceptions: 

None 
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ADDU Add Unsigned ADDU 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

rs 

rt 

rd 

0 

ADDU 

000000 




00000 

1 00001 

6 

5 

5 

5 

5 

6 


Format: 

ADDU rd, rs, rt 

Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general register rd. 
No overflow exception occurs under any circumstances. The source 
operands must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the ADD instruction 
is that ADDU never causes an overflow exception. 

Operation: 

T: temp <- GPR[rs] + GPRfrt] 

GPR[rd] <- (temp 3 i) 32 II temp 31 0 


Exceptions: 

None 
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AND And AND 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

rs 

rt 

rd 

0 

AND 

000000 




00000 

100100 

6 

5 

5 

5 

5 

6 


Format: 

AND rd, rs, rt 

Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical AND operation. The result is placed 
into general register rd. 

Operation: 


T: GPR[rd] <- GPR[rs] and GPR[rt] 


Exceptions: 

None 














CPU Instruction Set Details 


Appendix A 


ANDI And Immediate ANDI 


31 26 25 21 20 16 15 0 


ANDI 

rs 

rt 

immediate 

001100 




6 

5 

5 

16 


Format: 

ANDI rt, rs, immediate 

Description: 

The 16-bit immediate is zero-extended and combined with the contents 
of general register rs in a bit-wise logical AND operation. The result is 
placed into general register rt. 

Operation: 


T: GPR[rt] <- 0 48 II (immediate and GPR[rs] 15 0 ) 


Exceptions: 

None 
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BCzF 


Branch On Coprocessor z False 


BCzF 


31 26 25 21 20 16 15 0 


COPz 

BC 

BCF 

offset 

0 1 0 0 x x* 

0 1000 

00000 


6 

5 

5 

16 


Format: 

BCzF offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. If coprocessor z’s condition signal (CpCond), as 
sampled during the previous instruction, is false, then the program 
branches to the target address with a delay of one instruction. 

Because the internal condition signal is sampled during the previous 
instruction, there must be at least one instruction between this instruction 
and a coprocessor instruction that changes the internal condition signal. 

Operation: 


T-1 : condition <- not COC[z] 

T: target (off set 15 ) 46 II offset II 0 2 

T+1: if condition then 

PC <- PC + target 

endif 


Note: *See the table “Opcode Bit Encoding” on next page, or “CPU 
Instruction Opcode Bit Encoding” at the end of Appendix A. 

Exceptions: 

Coprocessor unusable exception 

Opcode Bit Encoding: 


BCzF Bi * * 

BC0F 

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 



Bit# 

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 

BC1F 

0 

1 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 



Bit# 

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 

BC2F 

0 

1 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 



^ / 

^ J 

Y J 

k J 


Y > 

Opcode 

Coprocessor Unit Number 

BC sub-opcode Branch condition 
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nA_ri Branch On Coprocessor z nA.ri 
DV/ZP L False Likely DV/ZPL 

”31 26 25 21 20 16 15 (T| 


COPz 

BC 

BCFL 

offset 

0 1 0 0 x x* 

01 000 

0001 0 


6 

5 

5 

16 


Format: 

BCzFL offset 


Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. If the contents of coprocessor z’s condition signal, as 
sampled during the previous instruction, is false, the target address is 
branched to with a delay of one instruction. 

If the conditioned branch is not taken, the instruction in the branch 
delay slot is nullified. 

Because the internal condition signal is sampled during the previous 
instruction, there must be at least one instruction between this instruction 
and a coprocessor instruction that changes the internal condition signal. 

NOTE: *See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 


Operation: 


T — 1 : 
T: 

T+1: 


condition <- not COC[z] 
target <- (offset! 5 ) 46 II offset II 0 2 
if condition then 

PC <- PC + target 

else 


NullifyCurrentlnstruction 

endif 


Exceptions: 

Coprocessor unusable exception 

Opcode Bit Encoding: 
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BCzT Branch On Coprocessor z True BCzT 


26 25 


21 20 


1615 


COPz 

BC 

BCT 

offset 

01 0 0 x x* 

01000 

00001 


6 

5 

5 

16 


Format: 

BCzT offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. If the coprocessor z’s condition signal (CpCond) is 
true, then the program branches to the target address, with a delay of one 
instruction. 

Because the internal condition signal is sampled during the previous 
instruction, there must be at least one instruction between this instruction 
and a coprocessor instruction that changes the internal condition signal. 

Operation: 


T-1 : condition <- COC[zl 
T: target <- (offset 15 ) 4 ° II offset II 0 2 

T+1: if condition then 

PC PC + target 

endif 


NOTE: *See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 

Exceptions: 

Coprocessor unusable exception 

Opcode Bit Encoding: 


.T Rit a 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 O 


BCzT Bit# r 


BOOT 


0100 0 00100 000 001 


B j t# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 


BC1T 


0100 010100 000 001 


B j t # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 

BC2T 0 1 1 1 0 I 0 1 1 1 0 1 0 | 1 1 0 1 0 |o|o|o| o|o| 1~ 


Opcode 

Coprocessor Unit Number- 


0 

i 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

1 


v : z? 






BC sub-opcode Branch condition 
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BCzTL BrancMDn Coprocessor z BCzTL 


31 26 25 21 20 1615 0 


COPz 

BC 

BCTL 

offset 

0 1 0 0 x x* 

01 000 

0001 1 


6 

5 

5 

16 


Format: 

BCzTL offset 


Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. If the contents of coprocessor z’s condition signal, as 
sampled during the previous instruction, is true, the target address is 
branched to with a delay of one instruction. 

If the conditional branch is not taken, the instruction in the branch 
delay slot is nullified. 

Because the internal condition signal is sampled during the previous 
instruction, there must be at least one instruction between this instruction 
and a coprocessor instruction that changes the internal condition signal. 


Operation: 


T-1: 

T: 

T+1: 


condition <- COC[zl 

target <- (offset 15 ) 46 ll offset II 0 2 

if condition then 

PC «- PC + target 

else 


NullifyCurrentlnst ruction 

endif 


NOTE: *See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 


Exceptions: 

Coprocessor unusable exception 


Opcode Bit Encoding: 
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BEQ Branch On Equal BEQ 


31 26 25 21 20 16 15 0 


BEQ 

rs 

rt 

offset 

000100 




6 

5 

5 

16 


Format: 

BEQ rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are equal, then the 
program branches to the target address, with a delay of one instruction. 

Operation: 

T: target <- (offset 15 ) 46 II offset II 0 2 

condition <- (GPR[rs] = GPR[rt]) 

T+1: if condition then 

PC «- PC + target 

end if 


Exceptions: 

None 
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BEQL Branch On Equal Likely BEQL 


31 26 25 21 20 16 15 0 


BEQL 

rs 

rt 

offset 

010100 




6 

5 

5 

16 


Format: 

BEQL rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are equal, the target 
address is branched to, with a delay of one instruction. If the conditional 
branch is not taken, the instruction in the branch delay slot is nullified. 

Operation: 

T : target 4 - (offset., 5 ) 46 1 1 offset 1 1 0 2 

condition 4 - (GPR[rs] = GPR[rt]) 

T+1: if condition then 

PC <- PC + target 

else 

NullifyCurrentlnstruction 

endif 


Exceptions: 

None 
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PPC7 Branch On Greater Than npr7 

Or Equal To Zero DUEL 


31 26 

25 21 

20 16 

15 0 

REGIMM 

rs 

BGEZ 

offset 

000001 


00001 


6 

5 

5 

16 


Format: 

BGEZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. If the contents of general register rs have the sign bit 
cleared, then the program branches to the target address, with a delay of 
one instruction. 

Operation: 

T: target <- (offset-^) 46 II offset II 0 2 

condition <- (GPR[rs] 63 = 0) 

T+1 : if condition then 

PC 4 - PC + target 

endif 


Exceptions: 

None 
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BGEZAL 


Branch On Greater Than 
Or Equal To Zero And Link 


BGEZAL 


31 26 25 21 20 16 15 0 


REGIMM 

rs 

BGEZAL 

offset 

000001 


1 0001 


6 

5 

5 

16 


Format: 

BGEZAL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. Unconditionally, the address of the instruction after 
the delay slot is placed in the link register, r31. If the contents of general 
register rs have the sign bit cleared, then the program branches to the 
target address, with a delay of one instruction. 

General register rs may not be general register 31, because such an 
instruction is not restartable. An attempt to execute this instruction is not 
trapped, however. 

Operation: 

T : target <- (offset 15 ) 46 il offset II 0 2 

condition <- (GPR[rs] 63 = 0) 

GPR[31] <- PC + 8 
T+1: if condition then 
PC <- PC + target 
end if 


Exceptions: 

None 
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BGEZALL Bra o r h E°u n a?Toze r r an BGEZALL 

And Link Likely 


31 26 25 21 20 16 15 0 


REGIMM 

rs 

BGEZALL 

offset 

000001 


10011 


6 

5 

5 

16 


Format: 

BGEZALL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. Unconditionally, the address of the instruction after 
the delay slot is placed in the link register, r31. If the contents of general 
register rs have the sign bit cleared, then the program branches to the 
target address, with a delay of one instruction. General register rs may not 
be general register 31, because such an instruction is not restartable. An 
attempt to execute this instruction is not trapped, however. If the 
conditional branch is not taken, the instruction in the branch delay slot is 
nullified. 

Operation: 

T: target <- (offset 15 ) 46 II offset II 0 2 

condition <- (GPR[rs] 63 = 0) 

GPR[31] <- PC + 8 
T+1 : if condition then 
PC <- PC + target 
else 

NullifyCurrentlnstruction 

endif 


Exceptions: 

None 
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BGEZL 


Branch On Greater 
Than Or Equal To Zero Likely 


31 26 25 21 20 16 15 0 


REGIMM 

rs 

BGEZL 

offset 

000001 


0001 1 


6 

5 

5 

16 


Format: 

BGEZL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. If the contents of general register rs have the sign bit 
cleared, then the program branches to the target address, with a delay of 
one instruction. If the conditional branch is not taken, the instruction in 
the branch delay slot is nullified. 


Operation: 


T: 

target 4- (offset 15 ) 46 II offset II 0 2 
condition 4 - (GPR[rs] 63 = 0) 

T+1: 

if condition then 


PC 4 - PC + target 


else 


NullifyCurrentlnst ruction 

endif 



Exceptions: 

None 
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BGTZ Branch On Greater Than Zero BGTZ 


31 26 

25 21 

20 16 

15 0 

BGTZ 

rs 

0 

offset 

000111 


00000 


6 

5 

5 

16 


Format: 

BGTZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset shifted left two bits 
and sign-extended. The contents of general register rs are compared to 
zero. If the contents of general register rs have the sign bit cleared and are 
not equal to zero, then the program branches to the target address, with a 
delay of one instruction. 

Operation: 


T: 

target <- (offset 15 ) 46 II offset II 0 2 

condition <- (GPR[rs ] 6 3 = 0) and (GPR[rs] * 0 64 ) 

T+1: 

if condition then 


PC PC + target 
endif 


Exceptions: 

None 
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BGTZL 


Branch On Greater 
Than Zero Likely 


BGTZL 


31 26 

25 

21 

20 16 

15 

0 

BGTZL 

010111 

rs 

0 

00000 

offset 

6 


5 

5 

16 



Format: 

BGTZL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. The contents of general register rs are compared to 
zero. If the contents of general register rs have the sign bit cleared and are 
not equal to zero, then the program branches to the target address, with a 
delay of one instruction. If the conditional branch is not taken, the 
instruction in the branch delay slot is nullified. 

Operation: 

T- target <- (offset^) 46 II offset II 0 2 

condition <- (GPR[rs] 63 = 0) and (GPR[rs] * 0 64 ) 

T+1: if condition then 

PC <- PC + target 

else 

NullifyCurrentlnst ruction 

endif 


Exceptions: 

None 
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n | r7 Branch on Less Than ni C7 

DLCl Or Equal To Zero DLCti 


31 26 

25 21 

20 16 

15 0 

BLEZ 

rs 

0 

offset 

000110 


00000 


6 

5 

5 

16 


Format: 

BLEZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. The contents of general register rs are compared to 
zero. If the contents of general register rs have the sign bit set, or are equal 
to zero, then the program branches to the target address, with a delay of 
one instruction. 

Operation: 

T: target «- (offset 5 ) 46 II offset II 0 2 

condition <- (GPR[rs] 63 = 1) and (GPR[rs] = 0 64 ) 

T+1: if condition then 

PC <- PC + target 

endif 


Exceptions: 

None 
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n | CTTI Branch on Less Than n i F71 

DLC^IL Or Equal To Zero Likely dLCZ.L 


31 26 

25 21 

20 16 

15 0 

BLEZL 

rs 

0 

offset 

010110 


00000 


6 

5 

5 

16 


Format: 

BLEZL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. The contents of general register rs is compared to zero. 
If the contents of general register rs have the sign bit set, or are equal to 
zero, then the program branches to the target address, with a delay of one 
instruction. 

If the conditional branch is not taken, the instruction in the branch 
delay slot is nullified. 

Operation: 

T: target <- (offset 15 ) 46 II offset II 0 2 

condition <- (GPR[rs] 63 = 1) and (GPR[rs] = 0 64 ) 

T+1: if condition then 

PC <r- PC + target 

else 

NullifyCurrentlnstruction 

endif 


Exceptions: 

None 
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BLTZ Branch On Less Than Zero BLTZ 


31 26 25 21 20 16 15 0 


REGIMM 

rs 

BLTZ 

offset 

000001 


00000 


6 

5 

5 

16 


Format: 

BLTZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. If the contents of general register rs have the sign bit 
set, then the program branches to the target address, with a delay of one 
instruction. 

Operation: 

T : target <- (offset-^) 46 II offset II 0 2 

condition <- (GPR[rs ] 6 3 = 1) 

T+1: if condition then 

PC <- PC + target 

endif 


Exceptions: 

None 
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BLTZAL ThWro^fnk BLTZAL 


31 26 

25 21 

20 16 

15 0 

REGIMM 

rs 

BLTZAL 

offset 

000001 


1 0000 


6 

5 

5 

16 


Format: 

BLTZAL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. Unconditionally, the address of the instruction after 
the delay slot is placed in the link register, r31. If the contents of general 
register rs have the sign bit set, then the program branches to the target 
address, with a delay of one instruction. 

General register rs may not be general register 31, because such an 
instruction is not restartable. An attempt to execute this instruction with 
register 31 specified as rs is not trapped, however. 

Operation: 

T: target 4- (offset., 5 ) 46 II offset II 0 2 

condition 4- (GPR[rs] 63 = 1) 

GPR[31] 4 - PC + 8 
T+1: if condition then 

PC <- PC + target 

endif 


Exceptions: 

None 















CPU Instruction Set Details 


Appendix A 


BLTZALL 


Branch On Less 
Than Zero And Link Likely 


BLTZALL 


31 26 25 21 20 16 15 0 


REGIMM 

rs 

BLTZALL 

offset 

000001 


10010 


6 

5 

5 

16 


Format: 

BLTZALL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. Unconditionally, the address of the instruction after 
the delay slot is placed in the link register, r31. If the contents of general 
register rs have the sign bit set, then the program branches to the target 
address, with a delay of one instruction. 

General register rs may not be general register 31, because such an 
instruction is not restartable. An attempt to execute this instruction with 
register 31 specified as rs is not trapped, however. If the conditional 
branch is not taken, the instruction in the branch delay slot is nullified. 

Operation: 

T: target «- (offset 15 ) 46 II offset II 0 2 

condition <- (GPR[rs] 63 = 1) 

GPR[31] <- PC + 8 
T+1: if condition then 

PC <- PC + target 

else 

NullifyCurrentlnst ruction 

endif 


Exceptions: 

None 
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BLTZL Branch On Less Than Zero Likely BLTZL 


31 26 25 21 20 16 15 0 


REGIMM 

rs 

BLTZL 

offset 

000001 


0001 0 


6 

5 

5 

16 


Format: 

BLTZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. If the contents of general register rs have the sign bit 
set, then the program branches to the target address, with a delay of one 
instruction. If the conditional branch is not taken, the instruction in the 
branch delay slot is nullified. 

Operation: 

T: target <- (offset-, 5 ) 46 II offset II 0 2 

condition <- (GPR[rs] 63 = 1) 

T+1: if condition then 

PC 4- PC + target 

else 

NullifyCurrentlnst ruction 

endif 


Exceptions: 

None 
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Branch On Not Equal 




31 26 25 21 20 16 15 0 


BNE 

rs 

rt 

offset 

000101 




6 

5 

5 

16 


Format: 

BNE rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are not equal, then 
the program branches to the target address, with a delay of one 
instruction. 

Operation: 

T: target*- (off set i 5 ) 46 II offset II 0 2 

condition <- (GPR[rs] * GPR[rt]) 

T+1: if condition then 

PC *- PC + target 

endif 


Exceptions: 

None 
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BNEL Branch On Not Equal Likely BNEL 


31 26 25 21 20 16 15 0 


BNEL 

rs 

rt 

offset 

010101 




6 

5 

5 

16 


Format: 

BNEL rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are not equal, then 
the program branches to the target address, with a delay of one 
instruction. 

If the conditional branch is not taken, the instruction in the branch 
delay slot is nullified. 

Operation: 

T: target «- (offset i 5 ) 46 II offset II 0 2 

condition <- (GPR[rs] * GPRfrt]) 

T+1 : if condition then 

PC <- PC + target 

else 

NullifyCurrentlnstruction 

endif 


Exceptions: 

None 
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BREAK Breakpoint BREAK 


31 26 25 6 5 0 


SPECIAL 

code 

BREAK 

000000 


001101 

6 

20 



Format: 

BREAK 

Description: 

A breakpoint trap occurs, immediately and unconditionally 
transferring control to the exception handler. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 


T: BreakpointException 


Exceptions: 

Breakpoint exception 


A- 33 
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CACHE Cache CACHE 


31 26 

25 

21 

20 16 

15 

0 

CACHE 

101111 

base 

op 

offset 

6 


5 

5 

16 



Format: 

CACHE op, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The virtual address is translated 
to a physical address using the TLB, and the 5-bit sub-opcode specifies a 
cache operation for that address. 

If CPO is not usable (User or Supervisor mode) the CPO enable bit in the 
Status register is clear, and a coprocessor unusable exception is taken. 
The operation of this instruction on any operation/cache combination not 
listed below is undefined. The operation of this instruction on uncached 
addresses is also undefined. 

The R4600/R4700 uses only the tag comparisons, not the valid bits, to 
choose which data it supplies to the instruction unit. This makes it 
important that the tags of the A and B sets are never the same. 

The Index operation uses part of the virtual address to specify a cache 
block, with vAddr 13 selecting the set to access. 

For a primary cache of 16KB with 32 bytes per tag, vAddr 12 . 5 specifies 
the block. 

Index Load Tag also uses vAddr 4 -3 to select the doubleword for reading 
parity. When the CE bit of the Status register is set, Hit WriteBack, Hit 
WriteBack Invalidate, Index WriteBack Invalidate, and Fill also use 
vAddr 4 3 to select the doubleword that has its parity modified. This 
operation is performed unconditionally. 

The Hit operation accesses the specified cache as normal data 
references, and performs the specified operation if the cache block 
contains valid data with the specified physical address (a hit). If both sets 
are invalid or contain different addresses (a miss), no operation is 
performed. 

Write back from a primary cache goes to memory. The address to be 
written is specified by the cache tag and not the translated physical 
address. 

TLB Refill and TLB Invalid exceptions can occur on any operation. For 
Index operations (where the physical address is used to index the cache 
but need not match the cache tag) unmapped addresses may be used to 
avoid TLB exceptions. This operation never causes TLB Modified or Virtual 
Coherency exceptions. 

Bits 17.. 16 of the instruction specify the cache as follows: 




Cache 

0 

I 

primary instruction 

1 

D 

primary data 

2-3 

NA 

Undefined 


A -34 
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Bits 20.. 18 (this value is listed under the Code column) of the 
instruction specify the operation as follows: 


Code 

Caches 

Name 

Operation 

0 

I 

Index Invalidate 

Set the cache state of the cache block to Invalid. 
Index_Invalidate_I writes the physical address of the 
cache op into the tag when it clears the valid bit, which 
is different from the R4000. 

0 

D 

Index Write- 
Back Invalidate 

Examine the cache state and W bit of the primary data 
cache block at the index specified by the virtual 
address. If the state is not Invalid and the W bit is set, 
then write back the block to memory. The address to 
write is taken from the primary cache tag. Set cache 
state of primary cache block to Invalid. 

1 

I, D 

Index Load Tag 

Read the tag for the cache block at the specified index 
and place it into the TagLo CPO registers, ignoring par- 
ity errors. Also load the data parity bits into the ECC 
register. 

2 

I, D 

Index Store Tag 

Write the tag for the cache block at the specified index 
from the TagLo and TagHi CPO registers. 

3 

D 

Create Dirty 
Exclusive 

This operation is used to avoid loading data needlessly 
from memory when writing new contents into an entire 
cache block. If the cache block does not contain the 
specified address, and the block is dirty, write it back 
to the memory. In all cases, set the cache block tag to 
the specified physical address, set the cache state to 
Dirty Exclusive. 

4 

I, D 

Hit Invalidate 

If the cache block contains the specified address, mark 
the cache block invalid. 

5 

D 

Hit WriteBack 
Invalidate 

If the cache block contains the specified address, write 
back the data if it is dirty, and mark the cache block 
invalid. 

5 

I 

Fill 

Fill the primary instruction cache block from memory. 

If the CE bit of the Status register is set, the contents of 
the ECC register is used instead of the computed parity 
bits for addressed doubleword when written to the 
instruction cache. Uses bit 13 to pick the set. 


D 

Hit WriteBack 

If the cache block contains the specified address, and 
the W bit is set, write back the data to memory and 
clear the W bit. 

6 

I 

Hit WriteBack 

If the cache block contains the specified address, write 
back the data unconditionally. 


Operation: 

vAddr <- ((offset 15 ) 48 II offset 15 .. 0 ) + GPR[base] 

(pAddr, uncached) AddressTranslation (vAddr, DATA) 
CacheOp (op, vAddr, pAddr) 


Exceptions: 

Coprocessor unusable exception 


A - 35 
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CFCz 


Move Control From 
Coprocessor 


CFCz 



Format: 

CFCz rt, rd 

Description: 

The contents of coprocessor control register rd of coprocessor unit z are 
loaded Into general register rt. 

This instruction is not valid for CPO. 


Operation: 



Exceptions: 

Coprocessor unusable exception 


’Opcode Bit Encoding: 



A- 36 
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COPz 


Coprocessor Operation 


COPz 


31 26 

25 

24 


0 

COPz 

CO 


cofun 


01 0 0 x x* 

1 




6 

1 


25 



Format: 

COPz cofun 

Description: 

A coprocessor operation is performed. The operation may specify and 
reference internal coprocessor registers, and may change the state of the 
coprocessor condition line, but does not modify state within the processor 
or the cache/memoiy system. Details of coprocessor operations are 
contained in Appendix B. 

Operation: 


T: Coprocessor-Operation (z, cofun) 


Exceptions: 

Coprocessor unusable exception 

Coprocessor interrupt or Floating-Point Exception 


♦Opcode Bit Encoding: 


COPZ Bit# 

31 30 29 28 27 26 25 0 

C0P0 

0 

1 

0 

0 

0 

0 

1 


Bit# 

31 30 29 28 27 26 25 0 

C0P1 

0 

1 

0 

0 

0 

1 

1 


Bit# 

31 30 29 28 27 26 25 0 

C0P2 1 

0 

1 

0 

0 

t] 

0 

1 


Opcode |_q 

CO sub-opcode (see end of Appendix A) 
oprocessor Unit Number 


A- 37 
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Move Control to Coprocessor CTCz 


31 26 

o3 

in 

CM 

20 16 

15 11 

10 0 

COPz 

CT 

rt 

rd 

0 

0 1 OOxx* 

00110 



000 0000 0000 

6 

5 

5 

5 

11 



Format: 

CTCz rt, rd 

Description: 

The contents of general register rt are loaded into control register rd of 
coprocessor unit z. 

This instruction is not valid for CPO. 

Operation: 


T: data <- GPR[rt] 

T + 1: CCR[z,rd] <- data 


Exceptions: 

Coprocessor unusable 

NOTE: *See "CPU Instruction Opcode Bit Encoding" at the end of Appendix A. 


A- 38 
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DADD Doubleword Add DADD 



Format: 

DADD rd, rs, rt 

Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general register rd. 

An overflow exception occurs if the carries out of bits 62 and 63 differ 
(2’s complement overflow). The destination register rd is not modified 
when an integer overflow exception occurs. 

Operation: 

T. GPR[rd] <-GPR[rs] + GPR[rt] 


Exceptions: 

Integer overflow exception 


A- 39 
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DADDI Doubleword Add Immediate DADDI 


31 26 

25 21 

20 16 

15 

0 

DADDI 

01 1 000 

rs 

rt 

immediate 

6 

5 

5 

16 



Format: 

DADDI rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. 

An overflow exception occurs if carries out of bits 62 and 63 differ (2’s 
complement overflow). The destination register rt is not modified when an 
integer overflow exception occurs. 

Operation: 

T: GPR [rt] <- GPR[rs] + (immediate 15 ) 48 II immediate 15 0 


Exceptions: 

Integer overflow exception 


A- 40 
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Doubleword Add 
Immediate Unsigned 




31 26 

25 21 

20 16 

15 

0 

DADDIU 

011001 

rs 

rt 

immediate 

6 

5 

5 

16 



Format: 

DADD1U rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. No integer overflow exception occurs under any circumstances. 

The only difference between this instruction and the DADDI 
instruction is that DADDIU never causes an overflow exception. 

Operation: 

T: GPR [rt] <- GPR[rs] + (immediate-, 5 ) 48 li immediatei 5 0 


Exceptions: 

None 


A- 41 
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DADDU Doubleword Add Unsigned DADDU 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

rs 

rt 

rd 

0 

DADDU 

000000 




00000 

101101 

6 

5 

5 

5 

5 

6 


Format: 

DADDU rd, rs, rt 

Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general register rd. 
No overflow exception occurs under any circumstances. 

The only difference between this instruction and the DADD instruction 
is that DADDU never causes an overflow exception. 

Operation: 

T: GPR[rd] <-GPR[rs] + GPR[rt] 


Exceptions: 

None 
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DDIV Doubleword Divide DDIV 


31 26 25 21 20 16 15 6 5 0 


SPECIAL 

rs 

rt 

0 


000000 



00 0000 0000 


6 

5 

5 

10 

6 


Format: 

DDIV rs, rt 

Description: 

The contents of general register rs are divided by the contents of 
general register rt, treating both operands as 2’s complement values. No 
overflow exception occurs under any circumstances, and the result of this 
operation is undefined when the divisor is zero. 

This instruction is typically followed by additional instructions to 
check for a zero divisor and for overflow. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 


Operation: 


T-2: LO 

<- undefined 

HI 

<- undefined 

T-1: LO 

undefined 

HI 

undefined 

T: LO 

GPR[rs] div GPR[rt] 

HI 

GPR[rs] mod GPR[rt] 


Exceptions: 

None 
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DDIVU Doubleword Divide Unsigned DDIVU 


31 26 25 21 20 16 15 6 5 0 


SPECIAL 

rs 

rt 

0 

DDIVU 

000000 



000000 0000 

011111 

6 

5 

5 

10 

6 


Format: 

DDIVU rs, rt 

Description: 

The contents of general register rs are divided by the contents of 
general register rt, treating both operands as unsigned values. No integer 
overflow exception occurs under any circumstances, and the result of this 
operation is undefined when the divisor is zero. 

This instruction is typically followed by additional instructions to 
check for a zero divisor. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 


Operation: 


T-2: LO 

<- undefined 

HI 

<- undefined 

T-1: LO 

<- undefined 

HI 

<- undefined 

T: LO 

<- (0 II GPR[rs]) div (0 II GPR[rt]) 

HI 

<- (0 11 GPR[rs]) mod (0 II GPR[rt]) 


Exceptions: 

None 
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DIV Divide DIV 


31 


26 25 


21 20 


16 15 


6 5 


SPECIAL 

000000 

rs 

rt 

0 

00 0000 0000 

DIV 

011010 

6 

5 

5 

10 

6 


Format: 

DIV rs, rt 

Description: 

The contents of general register rs are divided by the contents of 
general register rt, treating both operands as 2’s complement values. No 
overflow exception occurs under any circumstances, and the result of this 
operation is undefined when the divisor is zero. 

The operands must be valid sign-extended, 32-bit values. 

This instruction is typically followed by additional instructions to 
check for a zero divisor and for overflow. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO , and the remainder word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 


Operation: 


T-2: LO 

<r- undefined 

HI 

<- undefined 

T— 1 : LO 

<— undefined 

HI 

<r- undefined 

T: q 

<- GPR[rs] 31 o div GPR[rt] 31 .. 0 

r 

*- GPRfrs] 31 o mod GPR[rt] 31 0 

LO 

(cteiL, M 31..0 

HI 

( r 3l) 2 II r 31..0 


Exceptions: 

None 
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DIVU Divide Unsigned DIVU 


31 26 

25 

21 

20 16 

15 6 

5 0 

SPECIAL 

000000 

rs 

rt 

0 

000000 0000 

DIVU 

011011 

6 


5 

5 

10 

6 


Format: 

DIVU rs, rt 

Description: 

The contents of general register rs are divided by the contents of 
general register rt, treating both operands as unsigned values. No integer 
overflow exception occurs under any circumstances, and the result of this 
operation is undefined when the divisor is zero. 

The operands must be valid sign-extended, 32-bit values. 

This instruction is typically followed by additional instructions to 
check for a zero divisor. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 


Operation: 


T-2: LO 

<- undefined 

HI 

<— undefined 

T-1: LO 

«- undefined 

HI 

<- undefined 

T: q 

(0 II GPR[rs] 31 0 ) div (0 II GPR[rt] 31 .. 0 ) 

r 

(0 II GPR[rsj 31 0 ) mod (0 II GPR[rt] 3 i 0 ) 

LO 

Mq 3 i£fllq3i..o" 

HI 

( r 3l) 32 II r 31..0 


Exceptions: 

None 


A- 46 
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DMFCO System Control Coprocessor DMFCO 


31 26 25 21 20 16 15 11 10 0 


COPO 

DMF 

rt 

rd 

0 

010000 

00001 



000 0000 00 00 

6 

5 

5 

5 

11 


Format: 

DMFCO rt, rd 

Description: 

The contents of coprocessor register rd of the CPO are loaded into 
general register rt 

This operation is defined in kernel mode regardless of the setting of the 
Status. KX bit. Execution of this instruction with in supervisor mode with 
Status. SX = 0 or in user mode with UX = 0, causes a reserved instruction 
exception. All 64-bits of the general register destination are written from 
the coprocessor register source. The operation of DMFCO on a 32-bit 
coprocessor 0 register is undefined. 

Operation: 

T: data <-CPR[0,rd] 

T+1: GPR[rt] data 


Exceptions: 

Coprocessor unusable exception 

Reserved instruction exception for supervisor mode with Status. SX = 0 
or user mode with Status.UX = 0. 


A- 47 
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DMTCO System Control Coprocessor DMTCO 


31 26 25 21 20 16 15 11 10 0 


COPO 

DMT 

rt 

rd 

0 

01 0000 

00101 



000 0000 00 00 

6 

5 

5 

5 

11 


Format: 

DMTCO rt, rd 

Description: 

The contents of general register rt are loaded into coprocessor register 
rd of the CPO. 

This operation is defined in kernel mode regardless of the setting of the 
Status. KX bit. Execution of this instruction with in supervisor mode with 
Status. SX = 0 or in user mode with UX = 0, causes a reserved instruction 
exception. 

All 64-bits of the coprocessor 0 register are written from the general 
register source. The operation of DMTCO on a 32-bit coprocessor 0 register 
is undefined. 

Because the state of the virtual address translation system may be 
altered by this instruction, the operation of load instructions, store 
instructions, and TLB operations immediately prior to and after this 
instruction are undefined. 

Operation: 

T: data <- GPR[rt] 

T+1 : CPR[0,rd] <- data 


Exceptions: 

Reserved instruction exception for supervisor mode with Status.SX = 0 
or user mode with Status.UX = 0. 


A -48 
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DMULT Doubleword Multiply DMULT 


31 26 25 21 20 16 15 6 5 0 


SPECIAL 

rs 

rt 

0 

DMULT 

000000 



00 0000 0000 

011100 

6 

5 

5 

10 

6 


Format: 

DMULT rs, rt 

Description: 

The contents of general registers rs and rt are multiplied, treating both 
operands as 2’s complement values. No Integer overflow exception occurs 
under any circumstances. 

When the operation completes, the low-order word of the double result 
is loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of these instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two other instructions. 


Operation: 


T-2: LO 

<- undefined 

HI 

<- undefined 

T-1: LO 

<- undefined 

HI 

<- undefined 

T: t 

<- GPR[rs] * GPR[rt] 

LO 

* 63..0 

H 1 

* 127. .64 


Exceptions: 

None 
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DMULTU Unsigned DMULTU 


31 26 

25 21 

20 16 

15 6 

5 0 

SPECIAL 

rs 

rt 

0 

DMULTU 

000000 



00 0000 0000 

011101 

6 

5 

5 

10 

6 


Format: 

DMULTU rs, rt 

Description: 

The contents of general register rs and the contents of general register 
rt are multiplied, treating both operands as unsigned values. No overflow 
exception occurs under any circumstances. 

When the operation completes, the low-order word of the double result 
is loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of these instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two instructions. 

Operation: 

T-2: LO <- undefined 
HI <- undefined 
T-1 : LO <r- undefined 
HI undefined 

T: t <- (0 II GPR[rs]) * (0 II GPR[rt]) 

LO <— t63..o 
HI <-ti27..64 


Exceptions: 

None 


A- 50 
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DSLL Doubleword Shift Left Logical DSLL 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

0 

rt 

rd 

sa 

DSLL 

000000 

00000 




111000 


6 5 5 5 5 6 


Format: 

DSLL rd, rt, sa 

Description: 

The contents of general register rt are shifted left by sa bits, inserting 
zeros Into the low-order bits. The result is placed in register rd. 

Operation: 

T: s <- 0 II sa 

GPR[rd] <- GPR[rt]( 63 _ s) o II 0 s 


Exceptions: 

None 
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r\ O I I if Doubleword Shift Left r%OI I \t 
UOLLV Logical Variable UoLLV 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

rs 

rt 

rd 

0 

DSLLV 

000000 




00000 

010100 

6 

5 

5 

5 

5 

6 


Format: 

DSLLV rd, rt, rs 

Description: 

The contents of general register rt are shifted left by the number of bits 
specified by the low-order six bits contained in general register rs, inserting 
zeros into the low-order bits. The result is placed in register rd. 

Operation: 

T: s<-GPR[rs] 5 .. 0 

GPR[rd]<- GPR[ft]( 63 _ s) o II 0 s 


Exceptions: 

None 
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DSLL32 


Doubleword Shift Left 
Logical + 32 


DSLL32 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

0 

rt 

rd 

sa 

DSLL32 

000000 

00000 




111100 

6 

5 

5 

5 

5 

6 


Format: 

DSLL32 rd, rt, sa 

Description: 

The contents of general register rt are shifted left by 32+sa bits, 
inserting zeros into the low-order bits. The result is placed in register rd. 

Operation: 

T: s < — 1 1 1 sa 

GPR[rd]<- GPR[rt] (6 3 _s)..o II 0 s 


Exceptions: 

None 
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nCDA Doubleword nCDA 

UoKA Shift Right Arithmetic UOPlA 


CO 

CM 

T“ 

CO 

25 21 

20 16 

15 11 

10 6 

5 0 

SPECIAL 

0 

rt 

rd 

sa 

DSRA 

000000 

00000 




1110 11 

6 

5 

5 

5 

5 

6 


Format: 

DSRA rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, sign- 
extending the high-order bits. The result is placed in register rd. 

Operation: 

T: s 4 - 0 II sa 

GPR[rd] 4 - (GPR[rt] 63 ) s II GPR[rt] 63 .. s 


Exceptions: 

None 
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nonA\f Doubleword Shift Right r>ODAW 

DSRAV Arithmetic Variable DSRAV 

I 31 26 25 21 20 16 15 11 10 6~~5 0~| 


SPECIAL 

rs 

rt 

rd 

0 

DSRAV 

000000 




00000 

010111 

6 

5 

5 

5 

5 

6 


Format: 

DSRAV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order six bits of general register rs, sign-extending 
the high-order bits. The result is placed in register rd. 

Operation: 

T: s <- GPR[rs] 5 o 

GPR[rd] ^ (GPR[rt] 63 ) s II GPR[rt] 63 .. s 


Exceptions: 

None 
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DSRA32 Arithmetic + 32 ^ DSRA32 

I 31 26 25 21 20 16 15 11 10 6 5 (Tl 


SPECIAL 

0 

It 

rd 

sa 


000000 

00000 





6 

5 

5 

5 

5 

6 


Format: 

DSRA32 rd, rt, sa 

Description: 

The contents of general register rt are shifted right by 32+sa bits, sign- 
extending the high-order bits. The result is placed in register rcL 

Operation: 

T: s 4 — 1 II sa 

GPR[rd] <- (GPR[rt] 63 ) s II GPR[rt] 63 .. s 


Exceptions: 

None 
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DSRL 


Doubleword 
Shift Right Logical 


DSRL 


31 26 

25 21 

20 16 

15 11 

10 6 

5 0 

SPECIAL 

0 

rt 

rd 

sa 


000000 

00000 





6 

5 

5 

5 

5 

6 


Format: 

DSRL rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, inserting 
zeros into the high-order bits. The result is placed in register rd. 

Operation: 

T: s<-Ollsa 

GPR[rd] <- 0 s II GPR[rt] 63 .. s 


Exceptions: 

None 
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DSRLV 


Doubleword Shift Right 
Logical Variable 


DSRLV 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

rs 

rt 

rd 

0 

DSRLV 

000000 




00000 

010110 


6 5 5 5 5 6 


Format: 

DSRLV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order six bits of general register rs, inserting zeros 
into the high-order bits. The result is placed in register rd. 

Operation: 

T: s <- GPR[rs] 5 ..o 

GPR[rd] <- 0 s II GPR[rt] 63 .. s 


Exceptions: 

None 
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DSRL32 Logical + 32 ^ DSRL32 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

0 

rt 

rd 

sa 

DSRL32 

000000 

00000 




111110 


6 5 5 5 5 6 


Format: 

DSRL32 rd, rt, sa 

Description: 

The contents of general register rt are shifted right by 32+sa bits, 
inserting zeros into the high-order bits. The result is placed in register rd. 

Operation: 

T: s 1 II sa 

GPR[rd] <- 0 s II GPRIrtle^s 


Exceptions: 

None 
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DSUB Doubleword Subtract DSUB 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

rs 

rt 

rd 

0 

DSUB 

000000 




00000 

101110 


6 5 5 5 5 6 


Format: 

DSUB rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 
register rd. 

The only difference between this instruction and the DSUBU 
instruction is that DSUBU never traps on overflow. 

An integer overflow exception takes place if the carries out of bits 62 
and 63 differ (2’s complement overflow). The destination register rd is not 
modified when an integer overflow exception occurs. 

Operation: 


T: GPR[rd] <- GPR[rs] - GPR[rt] 


Exceptions: 

Integer overflow exception 
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DSUBU Doubleword Subtract Unsigned DSUBU 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

rs 

rt 

rd 

0 


000000 




00000 



6 5 5 5 5 6 


Format: 

DSUBU rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 
register rd 

The only difference between this instruction and the DSUB instruction 
is that DSUBU never traps on overflow. No integer overflow exception 
occurs under any circumstances. 

Operation: 


T: GPR[rd] <- GPR[rs] - GPR[rt] 


Exceptions: 

None 
















CPU Instruction Set Details 


Appendix A 


ERET Exception Return ERET 


31 26 25 24 6 5 0 


COPO 

CO 

0 

ERET 

01 0000 

1 

000 0000 0000 0000 0000 

011000 

6 

1 

19 

6 


Format: 

ERET 

Description: 

ERET is the R4600 instruction for returning from an interrupt, 
exception, or error trap. Unlike a branch or jump instruction, ERET does 
not execute the next instruction. 

ERET must not itself be placed in a branch delay slot. 

If the processor is servicing an error trap ( SR 2 = 1), then load the PC 
from the ErrorEPC and clear the ERL bit of the Status register (SR 2 ). 
Otherwise (SR 2 = 0). load the PC from the EPC, and clear the EXL bit of the 
Status register (SRj). 

An ERET executed between a LL and SC also causes the SC to fail. 


Operation: 



Exceptions: 

Coprocessor unusable exception 


A- 62 











CPU Instruction Set Details 


Appendix A 


j 


Jump 


J 



Format: 

J target 

Description: 

The 26-bit target address is shifted left two bits and combined with the 
high-order bits of the address of the delay slot. The program 
unconditionally jumps to this calculated address with a delay of one 
instruction. 

Operation: 

T : temp <- target 

T+1: PC < — PCg3 28 ^ temp II 0 ^ 


Exceptions: 

None 
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JAL Jump And Link JAL 


31 26 25 0 


JAL 

target 

00001 1 



6 26 


Format: 

JAL target 

Description: 

The 26-bit target address is shifted left two bits and combined with the 
high-order bits of the address of the delay slot. The program 
unconditionally jumps to this calculated address with a delay of one 
instruction. The address of the instruction after the delay slot is placed in 
the link register, t 31 . 

Operation: 

T: temp <- target 

GPR[31]<- PC + 8 
T+1 : PC <- PC 63 28 H temp II 0 2 


Exceptions: 

None 
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JALR Jump And Link Register JALR 


31 26 

25 21 

20 16 

15 11 

10 6 

5 0 

SPECIAL 

rs 

0 

rd 

0 

JALR 

000000 


00000 


00000 

001001 

6 

5 

5 

5 

5 

6 


Format: 

JALR rs 

JALR rd, rs 

Description: 

The program unconditionally jumps to the address contained in 
general register rs, with a delay of one instruction. The address of the 
instruction after the delay slot is placed in general register rcL The default 
value of rd, if omitted in the assembly language instruction, is 31. 

Register specifiers rs and rd may not be equal, because such an 
instruction does not have the same effect when re-executed. However, an 
attempt to execute this instruction is not trapped, and the result of 
executing such an instruction is undefined. 

Since instructions must be word-aligned, a Jump and Link Register 
instruction must specify a target register (rs) whose two low-order bits are 
zero. If these low-order bits are not zero, an address exception will occur 
when the jump target instruction is subsequently fetched. 


Operation: 


T: 

temp <- GPR [rs] 


GPR[rd] PC + 8 

T+1: 

PC <- temp 


Exceptions: 

None 
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JR Jump Register JR 


31 26 25 21 20 6 5 0 


SPECIAL 

rs 

0 

JR 

000000 


000 0000 0000 0000 

001000 

6 

5 

15 

6 


Format: 

JR rs 

Description: 

The program unconditionally jumps to the address contained in 
general register rs, with a delay of one instruction. 

Since instructions must be word-aligned, a Jump Register instruction 
must specify a target register (rs) whose two low-order bits are zero. If these 
low-order bits are not zero, an address exception will occur when the jump 
target instruction is subsequently fetched. 


Operation: 


T: 

temp GPR[rs] 

T+1: 

PC <— temp 


Exceptions: 

None 
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LB Load Byte LB 


31 26 

25 21 

20 16 

15 0 

LB 

base 

rt 

offset 

100000 




6 

5 

5 

16 


Format: 

LB rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the byte at the 
memory location specified by the effective address are sign-extended and 
loaded into general register rt. 

Operation: 

T: vAddr <- ((offset 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) 4- AddressTranslation (vAddr, DATA) 
pAddr 4- pAddrpsizE - 1 .. 3 II (pAddr 20 xor ReverseEndian 3 ) 
mem 4- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte 4- vAddr 2 0 xor BigEndianCPU 3 
GPR[rt] 4- (mem 7+8 . byte ) 56 II mem 7 +8 . byte i8 . byte 


Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LBU Load Byte Unsigned LBU 


31 26 

25 21 

20 16 

15 0 

LBU 

base 

rt 

offset 

100100 




6 

5 

5 

16 


Format: 

LBU rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the byte at the 
memory location specified by the effective address are zero-extended and 
loaded into general register rt 

Operation: 

T: vAddr <— ((offset 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE- 1..3 H (pAddr 20 xor ReverseEndian 3 ) 
mem <- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte <- vAddr 2 ..o xor BigEndianCPU 3 
GPR[rt] <- 0 56 II memy+g. byte .. 8 * byte 


Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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Load Doubleword 



31 26 

25 21 

20 16 

15 0 

LD 

base 

rt 

offset 

110111 




6 

5 

5 

16 


Format: 

LD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the 64-bit 
doubleword at the memory location specified by the effective address are 
loaded into general register rt 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception occurs. 

Operation: 


T: vAddr <- ((offset 15 ) 48 II offset 15 .. 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 

GPR[rt] <- mem 


Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LDCz Load Doubleword To Coprocessor LDCz 


31 26 

25 21 

20 16 

15 0 

LDCz 

base 

rt 

offset 

1 1 0 1 x x* 




6 

5 

5 

16 


Format: 

LDCz rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The processor reads a doubleword 
from the addressed memory location and makes the data available to 
coprocessor unit z. The manner in which each coprocessor uses the data 
is defined by the individual coprocessor specifications. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

This instruction is not valid for use with CPO. 

This instruction is undefined when the least-significant bit of the 
rt field is non-zero. 

Execution of the instruction referencing coprocessor 3 causes a 
reserved instruction exception, not a coprocessor unusable exception. 

NOTE: *See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 

Operation: 


T : vAddr <— ((offset^) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressT ranslation (vAddr, DATA) 

mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 

COPzLD (rt, mem) 


Exceptions: 

TLB refill exception 

TLB invalid exception 

Bus error exception 

Address error exception 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 

Opcode Bit Encoding: 


LDCz Bit #3i 

30 

29 

28 

27 

26 

0 

LDCll 1 

1 | 

0 

1 | 

0 

1 


Bit #31 

30 

29 

28 

27 

26 

0 

LDC2 | 1 | 

1 1 

0 1 

1 | 

1 

0 


V ' 

Opcode 

— ^ 

J 

Coprocessor Unit Number 
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LDL Load Doubleword Left LDL 


31 26 25 21 20 1615 0 


LDL 

base 

rt 

offset 

011010 




6 

5 

5 

16 


Format: 

LDLrt, offset(base) 

Description: 

This instruction can be used in combination with the LDR instruction 
to load a register with eight consecutive bytes from memory, when the 
bytes cross a doubleword boundary. LDL loads the left portion of the 
register with the appropriate part of the high-order doubleword: LDR loads 
the right portion of the register with the appropriate part of the low-order 
doubleword. 

The LDL instruction adds its sign-extended 16-bit offset to the contents 
of general register base to form a virtual address which can specify an 
arbitrary byte. It reads bytes only from the doubleword in memory which 
contains the specified starting byte. From one to eight bytes will be loaded, 
depending on the starting byte specified. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the high-order (left-most) byte of the register; then it loads bytes 
from memory into the register until it reaches the low-order byte of the 
doubleword in memory. The least-significant (right-most) byte(s) of the 
register will not be changed. 



The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LDL (or LDR) 
instruction which also specifies register rt. 

No address exceptions due to alignment are possible. 
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Operation: 

T : vAddr <- ((offset ^) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddr PS | 2 E-i ..3 II (pAddr 2 0 xor ReverseEndian 3 ) 

if BigEndianMem = 0 then 

pAddr <- pAddr PS | Z E_i 3 II 0 3 

endif 

byte 4 - vAddr 20 xor BigEndianCPU 3 

mem <- LoadMemory (uncached, byte, pAddr, vAddr, DATA) 

GPR[rt] 4 - mem 7 + 8 *ijyt e , o H GPR[rt] 5 5 _ a *byt e ..o 


Given a doubleword in a register and a doubleword in memoiy, the 
operation of LDL is as follows: 


LDL 


Register 

D 

B 

C 

D 

E 

F 

G 

H 




Memory 

D 

D 


n 

M 

N 

0 

P 






vAddr 2 .. 0 

BigEndianCPU = 0 

BigEndianCPU = 

1 

destination 

type 

offset 

destination 

type 

offset 

LEM BEM 

LEM BEM 

0 

P BCDEFGH 

0 

0 7 

1 J K L MN 0 P 

7 

0 0 

1 

OPCDEFGH 

1 

0 6 

JKLMNOPH 

6 

0 1 

2 

NOPDEFGH 

2 

0 5 

KL MNOPGH 

5 

0 2 

3 

MNOPEFGP 

3 

0 4 

L MNOPFGH 

4 

0 3 

4 

L MNOPFGH 

4 

0 3 

MNOPEFGH 

3 

0 4 

5 

KLMNOPGH 

5 

0 2 

NOPDEFGH 

2 

0 5 

6 

J K L MN 0 P H 

6 

0 1 

OPCDEFGH 

1 

0 6 

7 

1 J K L M N O P 

7 

0 0 

PBCDEFGH 

0 

0 7 


LEAfLittle-endian memory (BigEndianMem = 0) 
BEMBigEndianMem = 1 

TlypeAccessType (see Table 2. 1 on page 3) sent to memory 
CyfsetpAddr 2 ..o sent to memoiy 

Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LDR Load Doubleword Right LDR 


31 26 

25 21 

20 16 

15 0 

LDR 

base 

rt 

offset 

011011 




6 

5 

5 

16 


Format: 

LDR rt, offset(base) 

Description: 

This instruction can be used in combination with the LDL instruction 
to load a register with eight consecutive bytes from memory, when the 
bytes cross a doubleword boundary. LDR loads the right portion of the 
register with the appropriate part of the low-order doubleword: LDL loads 
the left portion of the register with the appropriate part of the high-order 
doubleword. 

The LDR instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which can 
specify an arbitrary byte. It reads bytes only from the doubleword in 
memory which contains the specified starting byte. From one to eight 
bytes will be loaded, depending on the starting byte specified. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the low-order (right-most) byte of the register: then it loads bytes 
from memory into the register until it reaches the high-order byte of the 
doubleword in memory. The most significant (left-most) byte(s) of the 
register will not be changed. 



The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LDR (or LDL) 
instruction which also specifies register rt 

No address exceptions due to alignment are possible. 
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Operation: 


T: vAddr <- ((offset 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i..3 N (pAddr 2 . 0 xor ReverseEndian 3 ) 

if BigEndianMem = 1 then 

pAddr 4- pAddr 31 3 II 0 3 

endif 

byte 4- vAddr 2 .. 0 xor BigEndianCPU 3 

mem 4- LoadMemory (uncached, byte, pAddr, vAddr, DATA) 

GPR[rt] 4- GPR[rt] 63 6 4 _ 8 . byt6 II mem 63 8 . byte 


Given a doubleword in a register and a doubleword in memory, the 
operation of LDR is as follows: 


LDR 


Register 

A 

B 

C 

D 

E 

F 

G 

H 




Memory 

1 

J 

K 

L 

M 

N 

0 

P 





vAddr 20 

BigEndianCPU = 0 

BigEndianCPU = 

1 

destination 

type 

offset 

destination 


offset 

LEM BEM 

type 

LEM BEM 

0 

1 J K L M N O P 


0 0 

A B C D E F G 1 

0 

7 0 

1 

A 1 J K L MN 0 


1 0 

A B C D E F 1 J 

i 

6 0 

2 

A B 1 J K L M N 


2 0 

A B C D E 1 J K 

2 

5 0 

3 

A B C 1 J K L M 


3 0 

A B C D 1 J K L 

3 

4 0 

4 

A B C D 1 J K L 


4 0 

A B C 1 J K L M 

4 

3 0 

5 

A B C D E 1 J K 


5 0 

A B 1 J K L M N 

5 

2 0 

6 

A B C D E F 1 J 

1 

6 0 

A 1 J K L MN O 

6 

1 0 

7 

A B C D E F G 1 

0 

7 0 

1 J K L MN 0 P 

7 

0 0 


LEMLittle-endian memory (BigEndianMem = OJ 
BEAfBigEndianMem = 1 

Tt/peAccessType (see Table 2.1 on page 3) sent to memory 
0#setpAddr 2 ..o sent to memory 

Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LH Load Halfword LH 


31 26 

CM 

LD 

CM 

20 16 

15 0 

LH 

base 

rt 

offset 

100001 




6 

5 

5 

16 


Format: 

LH rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the halfword at the 
memory location specified by the effective address are sign-extended and 
loaded into general register rt. 

If the least-significant bit of the effective address is non-zero, an 
address error exception occurs. 

Operation: 

T: vAddr <- ((offset^) 48 II offset 15 ., 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE- 1..3 H (pAddr 2i 0 xor (ReverseEndian II 0)) 
mem <- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte <- vAddr 2 0 xor (BigEndianCPU 2 II 0) 

GPR[rt] <- (mem 15+8 * byte ) 16 II mem 15+8 . byte 8 . byte 


Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LHU Load Halfword Unsigned LHU 


31 26 

25 21 

20 16 

15 0 

LHU 

base 

rt 

offset 

100101 




6 

5 

5 

16 


Format: 

LHU rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the halfword at the 
memory location specified by the effective address are zero-extended and 
loaded into general register rt. 

If the least-significant bit of the effective address is non-zero, an 
address error exception occurs. 

Operation: 

T: vAddr <- ((offset 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr 4- pAddrpsizE - 1..3 II (pAddr 2 . 0 xor (ReverseEndian 2 II 0)) 
mem <- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte <- vAddr 2 O xor (BigEndianCPU 2 II 0) 

GPR[rt] <-0 48 II mem 15+8 * byte 8 * byte 


Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus Error exception 
Address error exception 
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LL Load Linked LL 


31 26 

25 21 

20 16 

15 0 

LL 

base 

rt 

offset 

110000 




6 

5 

5 

16 


Format: 

LL rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into general 
register rt The loaded word is sign-extended. 

This instruction implicitly performs a SYNC operation; all loads and 
stores to shared memory fetched prior to the LL must access memory 
before the LL, and loads and stores to shared memory fetched subsequent 
to the LL must access memory after the LL. The processor begins checking 
the accessed word for modification by other processors and devices. 

Load Linked and Store Conditional can be used to atomically update 
memory locations as shown: 


LI; 


LL 

Tl, (TO) 

ADD 

T2, Tl, 1 

sc 

T2, (TO) 

BEQ 

T2, O, LI 

NOP 



This atomically increments the word addressed by TO. Changing the 
ADD to an OR changes this to an atomic bit set. 

This instruction is available in User mode, and it is not necessary for 
CPO to be enabled. 

The operation of LL is undefined if the addressed location is uncached 
and, for synchronization between multiple processors, the operation of LL 
is undefined if the addressed location is noncoherent. A cache miss that 
occurs between LL and SC may cause SC to fail, so no load or store 
operation should occur between LL and SC, otherwise the SC may never 
be successful. Exceptions also cause SC to fail, so persistent exceptions 
must be avoided. 

If either of the two least-significant bits of the effective address are non- 
zero, an address error exception takes place. 


A- 77 








CPU Instruction Set Details 


Appendix A 


Operation: 


T: vAddr ((offset 15 ) 48 II offset 15 .. 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i..3 H (pAddr 2 0 xor (ReverseEndian II 0 2 )) 
mem LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte <- vAddr 2 0 xor (BigEndianCPU II 0 2 ) 

GPR[rt] <- (mem 31+8 . byte ) 32 II mem 31+8 . byte 8 . byte 

LLbit <- 1 

SyncOperationQ 


Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 


A- 78 






CPU Instruction Set Details 


Appendix A 


LLD Load Linked Doubleword LLD 


31 26 

25 21 

20 16 

15 0 

LLD 

base 

rt 

offset 

110100 




6 

5 

5 

16 


Format: 

LLD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the doubleword at 
the memoiy location specified by the effective address are loaded into 
general register rt. 

This instruction implicitly performs a SYNC operation; all loads and 
stores to shared memory fetched prior to the LLD must access memoiy 
before the LLD, and loads and stores to shared memory fetched 
subsequent to the LLD must access memoiy after the LLD. The processor 
begins checking the accessed doubleword for modification by other 
processors and devices. 

Load Linked Doubleword and Store Conditional Doubleword can be 
used to atomically update memoiy locations: 


LI: 


LLD 

Tl, (TO) 

ADD 

T2, Tl, 1 

SCD 

T2, (TO) 

BEQ 

T2, 0, LI 

NOP 



This atomically increments the word addressed by TO. Changing the 
ADD to an OR changes this to an atomic bit set. 

The operation of LLD is undefined if the addressed location is 
uncached and, for synchronization between multiple processors, the 
operation of LLD is undefined if the addressed location is noncoherent. A 
cache miss that occurs between LLD and SCD may cause SCD to fail, so 
no load or store operation should occur between LLD and SCD, otherwise 
the SCD may never be successful. Exceptions also cause SCD to fail, so 
persistent exceptions must be avoided. 

This instruction is available in User mode, and it is not necessary for 
CPO to be enabled. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 
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Operation: 

T: vAddr «- ((offset-^) 48 II offset 15 -0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
GPR[rt] <- mem 
LLbit <- 1 
SyncOperationQ 


Exceptions: 

TLB refill exception 
TLB Invalid exception 
Bus error exception 
Address error exception 
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LUI 


Load Upper Immediate 


LUI 


31 26 

25 21 

20 16 

15 0 

LUI 

0 

rt 

immediate 

001111 

00000 



6 

5 

5 

16 


Format: 

LUI rt, immediate 

Description: 

The 16-bit immediate is shifted left 16 bits and concatenated to 16 bits 
of zeros. The result is placed into general register rt. The loaded word is 
sign-extended. 

Operation: 


T: GPR[rt] <- (immediate 15 ) 32 II immediate II 0 16 


Exceptions: 

None 
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LW 


Load Word 


LW 


31 26 

25 21 

20 16 

15 0 

LW 

base 

rt 

offset 

100011 




6 

5 

5 

16 


Format: 

LW rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into general 
register rt The loaded word is sign-extended. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

Operation: 


T: vAddr <- ((offset 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddr PS | 2 E-i ..3 II (pAddr 2 .. 0 xor (ReverseEndian II 0 2 )) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte <- vAddr 2 0 xor (BigEndianCPU II 0 2 ) 

GPR[rt] <- (mem 31+8 . byte ) 32 II mem 31+8 . byte 8 . byte 


Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWCz Load Word To Coprocessor LWCz 


31 26 

25 21 

20 16 

15 0 

LWCz 

base 

rt 

offset 

1 1 0 0 x x* 




6 

5 

5 

16 


Format: 

LWCz rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The processor reads a word from 
the addressed memory location, and makes the data available to 
coprocessor unit z. 

The manner in which each coprocessor uses the data is defined by the 
individual coprocessor specifications. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

This instruction is not valid for use with CPO. 

NOTE: *See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 

Operation: 


T: vAddr <- ((offset 15 ) 48 II offset 15 -0 ) + GPR[base} 

(pAddr, uncached)<- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i .,3 H (pAddr 2 0 xor (ReverseEndian II 0 2 )) 
mem <— LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
byte <- vAddr 2 0 xor (BigEndianCPU II 0 2 ) 

COPzLW (byte] rt, mem) 


Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 

Opcode Bit Encoding: 


t #31 

30 

29 

28 

27 

26 

0 

1 

| 1 

1 0 

1 0 

0 

1 1 


#31 

30 

29 

28 

27 

26 

0 

1 1 

1 | 

0 | 

0 | 

1 

0 | 



LWCz 


LWC1 


LWC2 


Opcode 


Coprocessor Unit Number 
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LWL Load Word Left LWL 


31 26 

25 21 

20 16 

15 0 

LWL 

base 

rt 

offset 

1 0001 0 




6 

5 

5 

16 


Format: 

LWL rt, offset(base) 

Description: 

This instruction can be used in combination with the LWR instruction 
to load a register with four consecutive bytes from memory, when the bytes 
cross a word boundary. LWL loads the left portion of the register with the 
appropriate part of the high-order word; LWR loads the right portion of the 
register with the appropriate part of the low-order word. 

The LWL instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which can 
specify an arbitrary byte. It reads bytes only from the word in memory 
which contains the specified starting byte. From one to four bytes will be 
loaded, depending on the starting byte specified. The loaded word is sign- 
extended. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the high-order (left-most) byte of the register; then it loads bytes 
from memory into the register until it reaches the low-order byte of the 
word in memory. The least-significant (right-most) byte(s) of the register 
will not be changed. 



The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LWL (or LWR) 
instruction which also specifies register rt 

No address exceptions due to alignment are possible. 
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Operation: 


T: vAddr <- ((offset^) 48 II offset 5 0 ) + GPR[base] 

(pAddr, uncached) <— AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i..3 1 1 (pAddr 20 xor ReverseEndian 3 ) 
if BigEndianMem = 0 then 

pAddr <- pAddr PS | 2E _ 13 II 0 3 

end if 

byte <- vAddr 10 xor BigEndianCPU 2 
word vAddr 2 xor BigEndianCPU 

mem LoadMemory (uncached, 0 II byte, pAddr, vAddr, DATA) 
temp <- nTiem 31+ 3 2 . word . 8 . byte 32 . word II GPR[rt] 23 . 8 *byte..o 

GPR[rt] <- (temp 31 ) 32 II temp 


Given a doubleword in a register and a doubleword in memory, the 
operation of LWL is as follows: 


LWL 


Register 

A 

B 

C 

D 

E 

F 

G 

H 




Memory 

1 

J 

K 

L 

M 

N 

0 

P 



1 


vAddr 2 ..o 

BigEndianCPU = 0 



BigEndianCPU = 

1 


destination 

type 

offset 

destination 

type 

offset 

LEM 

BEM 

LEM 

BEM 


s 

s 

s 

S P F G H 

0 


7 

s s 

s 

S 1 J K 

L 

3 

4 

0 


s 

s 

s 

S O P G H 

1 


6 

s s 

s 

S J K L 

H 

2 

4 

1 


s 

s 

s 

S N 0 P H 

2 


5 

s s 

s 

S K L G 

H 

1 

4 

2 


s 

s 

s 

S M N 0 P 

3 


4 

s s 

s 

S L F G 

H 

0 

4 

3 

4 

s 

s 

s 

S L F G H 

0 

4 

3 

s s 

s 

S M N 0 

P 

3 

0 

4 

5 

s 

s 

s 

S K L G H 

1 

4 

2 

s s 

s 

S N O P 

H 

2 

0 

5 

6 

s 

s 

s 

S J K L H 

2 

4 

1 

s s 

s 

S O P G 

H 

1 

0 

6 

7 

s 

s 

s 

S 1 J K L 

3 

4 

0 

s s 

s 

S P F G 

H 

0 

0 

7 


Key to table: 

LEMLittle-endian memory (BigEndianMem = 0) 
BEJWBigEndianMem = 1 

TypeAccessType (see Table 2.1 on page 3) sent to memory 
Q#setpAddr 2 . 0 sent to memory 
Ssign-extend of destination ! 

Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWR Load Word Right LWR 


31 26 25 21 20 16 15 0 


LWR 

base 

rt 

offset 

100110 





5 

5 

16 


Format: 

LWR rt, offset(base) 

Description: 

This instruction can be used in combination with the LWL instruction 
to load a register with four consecutive bytes from memory, when the bytes 
cross a word boundary. LWR loads the right portion of the register with 
the appropriate part of the low-order word: LWL loads the left portion of 
the register with the appropriate part of the high-order word. 

The LWR instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which can 
specify an arbitrary byte. It reads bytes only from the word in memory 
which contains the specified starting byte. From one to four bytes will be 
loaded, depending on the starting byte specified. The loaded word is sign- 
extended. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the low-order (right-most) byte of the register; then it loads bytes 
from memory into the register until it reaches the high-order byte of the 
word in memory. The most significant (left-most) byte(s) of the register will 
not be changed. 



The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LWR (or LWL) 
instruction which also specifies register rt. 

No address exceptions due to alignment are possible. 
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Operation: 


T: vAddr <- ((offset^) 48 II offset i 50 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr pAddr P s i 7 F _i 3 II (pAddr 2 0 xor ReverseEndian 3 ) 
if BigEndianMem = 1 then 

pAddr <- pAddrpsizE- 3 i.. 3 1 1 ° 3 

endif 

byte <- vAddr^.o xor BigEndianCPU 2 
word <- vAddr 2 xor BigEndianCPU 

mem LoadMemory (uncached, 0 II byte, pAddr, vAddr, DATA) 

temp <r- GPR[rt] 3 i 3o. 8 »byte..0 H mem 3l+32*word-32*word+8*byte 
GPR[rt] <- (tempsi) 32 II temp 


Given a word in a register and a word in memoiy, the operation of LWR 
is as follows: 


LWR 


Register 

A 

B 

C 

D 

E 

F 

G 

H 




Memory 

1 

J 

K 

L 

M 

N 

O 

P 






Key to table: 

LEMLittle-endian memoiy (BigEndianMem = 0) 
BEiWBigEndianMem = 1 

TlypeAccessType (see Table 2.1 on page 3) sent to memoiy 
OffsetpAddr 2 "0 sent to memoiy 
Ssign-extend of destination j 

Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWU 


Load Word Unsigned 


LWU 


31 26 

25 21 

20 16 

15 0 

LWU 

base 

rt 

offset 

101111 





5 

5 

16 


Format: 

LWU rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into general 
register rt. The loaded word is zero-extended. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

Operation: 


T: vAddr <- ((offset^) 48 II offset, 50 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i ..3 1 1 (pAddr 20 xor (ReverseEndian II 0 2 )) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte vAddr 2 . o xor (BigEndianCPU II 0 2 ) 

GPR[rt] <- 0 32 II mem 31+8 . byte . , 8 . byte 


Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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MFCO System Control Coprocessor MFCO 



Format: 

MFCO rt, rd 

Description: 

The contents of coprocessor register rd of the CPO are loaded into 
general register rt May be used on both 32-bit and 64-bit CPO registers. 

Operation: 

T: data <- CPR[0,rd] 

T+1: GPR[rt] <- (data 31 ) 32 II data 31 0 

Exceptions: 

Coprocessor unusable exception 
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MFCz Move From Coprocessor MFCz 



Note: *See the table “Opcode Bit Encoding” on next page, or “CPU 
Instruction Opcode Bit Encoding” at the end of Appendix A. 

Format: 

MFCz rt, rd 

Description: 

The contents of coprocessor register rd of coprocessor z are loaded into 
general register rt. 

Execution of the instruction referencing coprocessor 3 causes a 
reserved instruction exception, not a coprocessor unusable exception. 

Operation: 

T: ifrd 0 = Othen 

data 4— CPR[z,rd 4 i II 0] 31 0 

else 

data 4- CPR[z,rd 4 i II 0] 63 32 

endif 

T+1: GPR[rt] 4- (data 31 ) 32 II data 


Exceptions: 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 


Opcode Bit Encoding: 
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MFHI Move From HI MFHI 


31 26 25 16 15 11 10 6 5 0 


SPECIAL 

0 

id 

0 

MFHI 

000000 

00 0000 0000 


00000 

010000 

6 

10 

5 

5 

6 


Format: 

MFHI rd 

Description: 

The contents of special register HI are loaded into general register rcL 
To ensure proper operation in the event of interruptions, the two 
instructions which follow a MFHI instruction may not be any of the 
instructions which modify the HI register: MULT, MULTU, DIV, DIVU, 
MTHI, DMULT, DMULTU, DDIV, DDIVU. 

Operation: 


T: GPR[rd] <- HI 


Exceptions: 

None 
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MFLO Move From Lo MFLO 



Format: 

MFLO rd 

Description: 

The contents of special register LO are loaded into general register rd. 
To ensure proper operation in the event of interruptions, the two 
instructions which follow a MFLO instruction may not be any of the 
instructions which modify the LO register: MULT, MULTU, DIV, DIVU, 
MTLO, DMULT, DMULTU, DDIV, DDIVU. 

Operation: 

T: GPR[rd] LO 


Exceptions: 

None 
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MTCO 


Move To 

System Control Coprocessor 


MTCO 


31 26 25 21 20 16 15 11 10 0 


COPO 

MT 

rt 

rd 

0 

01 0000 

00 100 



000 0000 00 00 

6 

5 

5 

5 

11 


Format: 

MTCO rt, rd 

Description: 

The contents of general register rt are loaded into coprocessor register 
rd ofCPO. 

Because the state of the virtual address translation system may be 
altered by this instruction, the operation of load instructions, store 
instructions, and TLB operations immediately prior to and after this 
instruction are undefined. 


Operation: 


T: 

data <- GPRfrt] 

T+1: 

CPR[0,rd] <- data 


Exceptions: 

Coprocessor unusable exception 
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MTCz Move To Coprocessor MTCz 



Format: 

MTCz rt, rd 


Description: 

The contents of general register rt are loaded into coprocessor register 
rd of coprocessor z. Execution of the instruction referencing coprocessor 
3 causes a reserved instruction exception, not a coprocessor unusable 
exception. 

Operation: 

T: data <- GPR[rt] 3 i 0 
T+1: if rd 0 = 0 

CPR[z,rd 4 ! II 0] <- CPR[z, rd 4 , II 0] 63 32 II data 

0|S0 

CPR[z,rd 4-1 II 0] <- data II CPR[z,rd 4 -1 II 0] 31 0 
endif 


Exceptions: 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 

♦Opcode Bit Encoding: 
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MTHI Move To HI MTHI 


31 26 25 21 20 6 5 0 


SPECIAL 

rs 

0 

MTHI 

000000 


000 000000000000 

01 0001 

6 

5 

15 

6 


Format: 

MTHI rs 

Description: 

The contents of general register rs are loaded into special register HI. 
If a MTHI operation is executed following a MULT, MULTU, DIV, or 
DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI 
instructions, the contents of special register LO are undefined. 

Operation: 


T-2: HI <- undefined 
T-1: HI <- undefined 
T: HI GPR[rs] 


Exceptions: 

None 
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MTLO Move To LO MTLO 


31 26 25 21 20 6 5 0 


SPECIAL 

rs 

0 

MTLO 

000000 


000000000000000 

010011 

6 

5 

15 



Format: 

MTLO rs 

Description: 

The contents of general register rs are loaded into special register LO. 
If a MTLO operation is executed following a MULT, MULTU, DIV, or 
DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI 
instructions, the contents of special register HI are undefined. 

Operation: 


T-2: LO undefined 
T— 1 : LO <- undefined 
T: LO <- GPR[rs] 


Exceptions: 

None 
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MULT Multiply MULT 



Format: 

MULT rs, it 

Description: 

The contents of general registers rs and rt are multiplied, treating both 
operands as 32-bit 2’s complement values. No integer overflow exception 
occurs under any circumstances. The operands must be valid 32-bit, sign- 
extended values. 

When the operation completes, the low-order word of the double result 
is loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of these instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two other instructions. 


Operation: 


T-2: LO 

4 - undefined 

HI 

<- undefined 

T-1: LO 

<- undefined 

HI 

4 - undefined 

T: t 

LO 

<- GPRJ[ s ]3i..o * GPR[rt] 31 o 

(*3l)^ H *31. .0 

HI 

(W 2 M *63.. 32 


Exceptions: 

None 
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MULTU Multiply Unsigned MULTU 


31 26 25 21 20 16 15 6 5 0 


SPECIAL 

rs 

rt 

0 

MULTU 

000000 



00 0000 0000 

011001 

6 

5 

5 

10 

6 


Format: 

MULTU rs, rt 

Description: 

The contents of general register rs and the contents of general register 
rt are multiplied, treating both operands as unsigned values. No overflow 
exception occurs under any circumstances. The operands must be valid 
32-bit, sign-extended values. 

When the operation completes, the low-order word of the double result 
is loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results 
of these instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two instructions. 

Operation: 


T-2: LO 

<- undefined 

HI 

<- undefined 

T— 1 : LO 

undefined 

HI 

<- undefined 

T: t 

<- (0 II GPR[rs] 31 ..o) * (0 II GPR[rt] 3 i 0 ) 

LO 

<-(t 3 i)^l | t 3 i..o 

HI 

(t63) II *63.. 32 


Exceptions: 

None 
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NOR Nor NOR 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

rs 

rt 

rd 

0 

NOR 

000000 


bhHhM 



100111 


6 5 5 5 5 6 


Format: 

NOR rd, rs, rt 

Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical NOR operation. The result is placed 
into general register rd. 

Operation: 


T: GPR[rd] «- GPR[rs] nor GPR[rt] 


Exceptions: 

None 
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OR 0r OR 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

rs 

rt 

rd 

0 

OR 

000000 




00000 

100101 

6 

5 

5 

5 

5 

6 


Format: 

OR rd, rs, rt 

Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical OR operation. The result is placed 
into general register rd. 

Operation: 


T: GPR[rd] <- GPR[rs] or GPR[rt] 


Exceptions: 

None 
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ORI Or Immediate ORI 


31 26 25 21 20 16 15 0 


ORI 

rs 

rt 

immediate 

001101 




6 

5 

5 

16 


Format: 

ORI rt, rs, immediate 

Description: 

The 16-bit immediate is zero-extended and combined with the contents 
of general register rs in a bit-wise logical OR operation. The result is placed 
into general register rt. 

Operation: 


T: GPR[rt] GPR[rs] 63 16 II (immediate or GPR[rs] 15 0 ) 


Exceptions: 

None 
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SB Store Byte SB 


31 26 

25 21 

20 16 

15 0 

SB 

base 

rt 

offset 

1 01 000 




6 

5 

5 

16 


Format: 

SB rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The least-significant byte of 
register rt is stored at the effective address. 

Operation: 

T: vAddr <- ((offset^) 48 lloffset 150 ) +GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i ..3 H (pAddr 20 xor ReverseEndian 3 ) 
byte <- vAddr 2 0 xor BigEndianCPU 3 
data <- GPR[rt] 63 _ 6 . byte .. 0 II 0 8 * b y‘ e 
StoreMemory (uncached, BYTE, data, pAddr, vAddr, DATA) 


Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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sc Store Conditional SC 


31 26 

25 21 

20 16 

15 0 

SC 

base 

rt 

offset 

111000 




6 

5 

5 

16 


Format: 

SC it, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register rt 
are conditionally stored at the memory location specified by the effective 
address. 

This instruction implicitly performs a SYNC operation; loads and 
stores to shared memoiy fetched prior to the SC must access memory 
before the SC; loads and stores to shared memory fetched subsequent to 
the SC must access memory after the SC. 

If any other processor or device has modified the physical address 
since the time of the previous Load Linked instruction, or if an ERET 
instruction occurs between the Load Linked instruction and this store 
instruction, the store fails and is inhibited from taking place. 

The success or failure of the store operation (as defined above) is 
indicated by the contents of general register rt after execution of the 
instruction. A successful store sets the contents of general register rt to 1; 
an unsuccessful store sets it to 0. 

The operation of Store Conditional is undefined when the address is 
different from the address used in the last Load Linked. 

This instruction is available in User mode; it is not necessary for CPO 
to be enabled. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception takes place. 

If this instruction should both fail and take an exception, the exception 
takes precedence. 

Operation: 


T: vAddr <- ((offset 15 ) 48 II offset 15 ., 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrp S |ZE.i 3 II (pAddr? 0 xor (ReverseEndian II 0 2 )) 
data <- GPR[rt] 63 . 8 . byte .. 0 II O 8 *^" 
if LLbit then 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 
endif 

GPR[rt]<- 0 63 II LLbit 
SyncOperation() 


Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SCD Store Conditional Doubleword SCD 


31 26 

25 21 

20 16 

15 0 

SCD 

base 

rt 

offset 

111100 




6 

5 

5 

16 


Format: 

SCD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register rt 
are conditionally stored at the memory location specified by the effective 
address. 

This instruction implicitly performs a SYNC operation: loads and 
stores to shared memory fetched prior to the SCD must access memory 
before the SCD: loads and stores to shared memory fetched subsequent to 
the SCD must access memory after the SCD. 

If any other processor or device has modified the physical address 
since the time of the previous Load Linked Doubleword instruction, or if 
an ERET instruction occurs between the Load Linked Doubleword 
instruction and this store instruction, the store fails and is inhibited from 
taking place. 

The success or failure of the store operation (as defined above) is 
indicated by the contents of general register rt after execution of the 
instruction. A successful store sets the contents of general register rt to 1 ; 
an unsuccessful store sets it to 0. 

The operation of Store Conditional Doubleword is undefined when the 
address is different from the address used in the last Load Linked 
Doubleword. 

This instruction is available in User mode: it is not necessary for CPO 
to be enabled. 

If either of the three least-significant bits of the effective address is 
non-zero, an address error exception takes place. 

If this instruction should both fail and take an exception, the exception 
takes precedence. 

Operation: 

T: vAddr <- ((offset 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
data <-GPR[rt] 
if LLbit then 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 
endif 

GPR[rt] <- 0 63 II LLbit 
SyncOperationQ 


Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SD Store Doubleword SD 


31 26 25 21 20 16 15 0 


SD 

base 

rt 

offset 

111111 




6 

5 

5 

16 


Format: 

SD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register rt 
are stored at the memory location specified by the effective address. 

If either of the three least-significant bits of the effective address are 
non-zero, an address error exception occurs. 

Operation: 

T: vAddr <— ((offset 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
data <- GPR[rt] 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 


Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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QQ02 Store Doubleword 

From Coprocessor 


SDCz 


31 26 

25 21 

20 16 

15 0 

SDCz 

base 

rt 

offset 

1 1 1 1 X X* 




6 

5 

5 

16 


Format: 

SDCz rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. Coprocessor unit z sources a 
doubleword, which the processor writes to the addressed memory location. 
The data to be stored is defined by individual coprocessor specifications. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

This instruction is not valid for use with CPO. 

This instruction is undefined when the least-significant bit of the rt 
field is non-zero. 

Operation: 

T: vAddr <- ((offset^ 48 I I offseti 5 ..o) + GPR[base] 

(pAddr, uncached) <— AddressTranslation (vAddr, DATA) 
data <- COPzSD(rt), 

StoreMemoiy (uncached, DOUBLEWORD, data, pAddr, 
vAddr, DATA) 


Note: *See the table in this section under “Opcode Bit Encoding." 
Also see “CPU Instruction Opcode Bit Encoding” at the end of Appendix A. 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 


Opcode Bit Encoding: 


SDCz Bit 

#31 

30 

29 

28 

27 

26 

0 

SDC1 

1 

1 

1 

1 




Bit 

#31 

30 

29 

28 

27 

26 

0 

SDC2 

1 

1 ' 

1 

1 | 


0 






w 

J 



V Y 

SD opcode Coprocessor Unit Number 
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SDL 


Store Doubleword Left 


SDL 


31 26 

25 21 

20 16 

15 0 

SDL 

base 

rt 

offset 

101100 




6 

5 

5 

16 


Format: 

SDL rt, offset(base) 

Description: 

This instruction can be used with the SDR instruction to store the 
contents of a register into eight consecutive bytes of memory, when the 
bytes cross a doubleword boundary. SDL stores the left portion of the 
register into the appropriate part of the high-order doubleword of memory: 
SDR stores the right portion of the register into the appropriate part of the 
low-order doubleword. 

The SDL instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which may 
specify an arbitrary byte. It alters only the word in memory which contains 
that byte. From one to four bytes will be stored, depending on the starting 
byte specified. 

Conceptually, it starts at the most-significant byte of the register and 
copies it to the specified byte in memory; then it copies bytes from register 
to memory until it reaches the low-order byte of the word in memory. 

No address exceptions due to alignment are possible. 



memory 

(big-endian) 

rAnistor 

address 8 

address 0 

8 

9 

10 

11 

12 

13 

14 

15 



0 

1 

2 

3 

4 

5 

6 

7 

before A B C D 

E F G H $24 



SDL $24,1 ($0) I J 


address 8 

address 0 

8 

9 

10 

11 

12 

13 

14 

15 

after 

0 

B 

C 

D 

E 

F 

G 

H 


Operation: 

T: vAddr ((offset 15 ) 48 II offset 15 . 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrp S | Z E_i ..3 II (pAddr 2 0 xor ReverseEndian 3 ) 
If BigEndianMem = 0 then 

pAddr <- pAddr 31 3 II 0 3 

endif 

byte vAddr 2 . o xor BigEndianCPU 3 
data <- O 56 - 8 ’^ II GPR[rt] 63 .. 56 _8* byte 
Storememory (uncached, byte, data, pAddr, vAddr, DATA) 
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Given a doubleword in a register and a doubleword in memory, the 
operation of SDL is as follows: 


SDL 


Register 

A 

B 

c 

D 

E 

F 

G 

H 




Memory 

n 

n 

K 

n 



0 

P 






BigEndianCPU = 0 




BigEndianCPU = 

1 



















offset 

vAddr 2 .. 0 


destination 



type 

LEM 

BEM 


destination 
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0 
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K 

L 

M 

N 

0 

A 



£3 

A B 

c 

D 

E 

F 

G 
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0 

0 

1 
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K 

L 

M 

N 

A 

B 




1 A 

B 

C 

D 

E 

F 

G 


0 

1 

2 
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K 

L 

M 

A 

B 

C 

1 


1 

1 J 

A 

B 

C 

D 

E 
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0 

2 

3 

1 J 

K 

L 

A 

B 

C 

D 

1 



1 J 

K 

A 

B 

C 

D 
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K 
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E 
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1 
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K 

L 

A 

B 

C 
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A 

B 

C 

D 

E 

F 
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K 

L 

M 

A 

B 

C 


0 

5 

6 
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B 

C 

D 

E 

F 

G 

6 

0 
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1 J 

K 

L 

M 

N 

A 

B 

1 

0 

6 

7 

A B 

C 

D 

E 

F 

G 

H 

7 

0 

0 

1 J 

K 

L 

M 

N 

O 

A 

0 

0 

7 


LEM Little-endian memory (BigEndianMem = 0) 

BEM BigEndianMem = 1 

Type AccessType (see Table 2. 1 on page 2-3) sent to memory 
Offset pAddr 2 ..o sent to memory 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SDR 


Store Doubleword Right 


SDR 


31 26 

25 21 

20 16 

15 0 

SDR 

base 

rt 

offset 

101101 




6 

5 

5 

16 


Format: 

SDR rt, offset(base) 

Description: 

This instruction can be used with the SDL instruction to store the 
contents of a register into eight consecutive bytes of memory, when the 
bytes cross a boundary between two doublewords. SDR stores the right 
portion of the register into the appropriate part of the low-order 
doubleword: SDL stores the left portion of the register into the appropriate 
part of the low-order doubleword of memory. 

The SDR instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which may 
specify an arbitrary byte. It alters only the word in memory which contains 
that byte. From one to eight bytes will be stored, depending on the starting 
byte specified. 

Conceptually, it starts at the least-significant (rightmost) byte of the 
register and copies it to the specified byte in memory; then it copies bytes 
from register to memory until it reaches the high-order byte of the word in 
memory. No address exceptions due to alignment are possible. 


address 8 
address 0 



( 

me 

[big- 

me 

en( 

yy 

fian 



register 

8 

9 

1C 

11 

12 

13 

14 

15 
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0 
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2 

3 
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5 

6 
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(big-endian) 

SDR $24,4($0 ) y 
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8 

9 

10 

11 12 

13 
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15 
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F 

G 
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6 
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Operation: 


T: vAddr <- ((offsef 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddr PS | ZE _ 13 II (pAddr 2 0 xor 

ReverseEndian 3 ) 

If BigEndianMem = 0 then 

pAddr <r- pAddrp S i2E-3i..3 II 0 3 

endif 

byte <- vAddr! 0 xor BigEndianCPU 3 

data <- GPR[rt] 63 _y bvte II O 8 *^ 6 


Given a doubleword in a register and a doubleword in memory, the 
operation of SDR is as follows: 
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SDR 


Register 

B 


c 

D 

E 







Memory 

B 

B 


B 




B 





vAddr 2 .. 0 

BigEndianCPU = 0 

BigEndianCPU = 

1 

destination 

type 

offset 

destination 

type 

offset 

LEM BEM 

LEM BEM 

0 

A BCDEFGH 

7 

0 0 

HJ KLMNOP 

0 

7 0 

1 

B CDEFGHP 

6 

1 0 

GHKLMNOP 

1 

6 0 

2 

CDEFGHOP 

5 

2 0 

FGHLMNOP 

2 

5 0 

3 

DEFGHNOP 

4 

3 0 

EFGHMNOP 

3 

4 0 

4 

E FGHMNOP 

3 

4 0 

DEFGHNOP 

4 

3 0 

5 

F GHLMNOP 

2 

5 0 

CDEFGHOP 

5 

2 0 

6 

GHKLMNOP 

1 

6 0 

BCDEFGH P 

6 

1 0 

7 

HJ KLMNOP 

0 

7 0 

ABCDEFGH 

7 

0 0 


LEM Little-endian memory (BigEndianMem = 0) 

BEM BigEndianMem = 1 

Type AccessType (see Table 2. 1 on page 2-3) sent to memory 
Offset pAddr 2 ..o sent to memory 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SH Store Halfword SH 


31 26 

25 21 

20 16 

15 0 

SH 

base 

rt 

offset 

101001 




6 

5 

5 

16 


Format: 

SH rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The least-significant 
halfword of register rt is stored at the effective address. If the least- 
significant bit of the effective address is non-zero, an address error 
exception occurs. 

Operation: 

T: vAddr <- ((offset 5 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i ..3 H (pAddr 2 0 xor (ReverseEndian 2 II 0)) 
byte <- vAddr 2 0 xor (BigEndianCPU 2 II 0) 
data <— GPR[rt]e3_8*byte..o ^ 0 8 byte 

StoreMemory (uncached, HALFWORD, data, pAddr, vAddr, DATA) 


Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SLL 


Shift Left Logical 


SLL 


31 26 

25 21 

20 

16 

15 

11 

10 6 

5 0 

SPECIAL 

000000 

0 

00000 

rt 

rd 

sa 

SLL 

000000 

6 

5 

5 


5 


5 

6 


Format: 

SLL rd, rt, sa 

Description: 

The contents of general register rt are shifted left by sa bits, inserting 
zeros into the low-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 

Operation: 

T: s <- 0 II sa 

temp <- GPR[rt] 31 . s 0 II 0 s 
GPR[rd] <r- (temp 31 ) 32 II temp 


Exceptions: 

None 
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SLLV Shift Left Logical Variable SLLV 


31 26 25 21 20 16 15 11 10 6 5 0 
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rt 

rd 

0 

SLLV 
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00000 

000100 

6 

5 

5 
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5 

6 


Format: 

SLLV rd, rt, rs 

Description: 

The contents of general register rt are shifted left the number of bits 
specified by the low-order five bits contained In general register rs, 
inserting zeros Into the low-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 

Operation: 

T: s 0 II GP[rs] 4 0 

temp GPR[rt]( 31 _ S ) o II 0 s 
GPR[rd] <- (temp 31 ) 32 II temp 


Exceptions: 

None 
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SLT Set On Less Than SLT 


31 26 25 21 20 16 15 11 10 6 5 0 
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rs 

rt 
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0 
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000000 




00000 

10 1010 

6 

5 

5 

5 

5 
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Format: 

SLT rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs. Considering both quantities as signed integers, if the 
contents of general register rs are less than the contents of general register 
rt, the result is set to one: otherwise the result is set to zero. 

The result is placed into general register rd. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 

Operation: 


T: if GPR[rs] < GPR[rt] then 

GPR[rd] 0 63 I1 1 
else 

GPR[rd] 0 64 
end if 


Exceptions: 

None 
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SLTI 


Set On Less Than Immediate 


SLTI 


31 26 25 21 20 16 15 0 



rs 

rt 

immediate 





6 

5 

5 

16 


Format: 

SLTI rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and subtracted from the 
contents of general register rs. Considering both quantities as signed 
integers, if rs is less than the sign-extended immediate, the result is set to 
one; otherwise the result is set to zero. 

The result is placed into general register rt. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 

Operation: 


T: if GPR[rs] < (immediate 15 ) 48 II immediate-^ 0 then 

GPR[rd] <- 0 63 11 1 
else 

GPR[rd] <- 0 64 
endif 


Exceptions: 

None 
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SLTIU 


Set On Less Than 
Immediate Unsigned 


SLTIU 


31 26 25 21 20 16 15 0 


SLTIU 

rs 

rt 

immediate 

001011 




6 

5 

5 

16 


Format: 

SLTIU rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and subtracted from the 
contents of general register rs. Considering both quantities as unsigned 
integers, if rs is less than the sign-extended immediate, the result is set to 
one; otherwise the result is set to zero. 

The result is placed into general register rt. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 

Operation: 


T: if (0 I I GPR[rsl) <011 (immediate js) 48 I I immediate 15 0 then 

GPR[rd] <r- 0 63 I I 1 
else 

GPR[rdl <- 0 64 
endif 


Exceptions: 

None 
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SLTU Set On Less Than Unsigned SLTU 


31 26 25 21 20 16 15 11 10 6 5 0 
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rt 

rd 
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SLTU 

000000 




00000 

101011 

6 

5 

5 
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5 
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Format: 

SLTU rd, rs, it 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs. Considering both quantities as unsigned integers, if 
the contents of general register rs are less than the contents of general 
register rt, the result is set to one: otherwise the result is set to zero. 

The result is placed into general register rd. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 

Operation: 


T: if (0 I I GPRIrsl) <011 GPR[rt] then 

GPR[rdl <- 0 63 I I 1 
else 

GPR[rd] <- 0 64 
endif 


Exceptions: 

None 
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Format: 

SRA rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, sign- 
extending the high-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 
Operation: 

T: s <- 0 II sa 

temp <- (GPR[rt] 31 ) s II GPR[rt] 31 .. s 
GPR[rd] <- (temp 31 ) 32 II temp 

Exceptions: 

None 
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SRAV Arithmetic*\?ariable SRAV 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

rs 
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rd 
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00000 

000111 

6 

5 
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6 


Format: 

SRAV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order five bits of general register rs, sign- 
extending the high-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 

Operation: 


T: s<-GPR[rs] 40 

temp <- (GPR[rt] 31 ) s II GPR[rt] 31 s 
GPR[rd] (temp 31 ) 32 II temp 


Exceptions: 

None 
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SRL Shift Right Logical SRL 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 

0 

rt 

rd 

sa 

SRL 

000000 

00000 




000010 


6 5 5 5 5 6 


Format: 

SRL rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, inserting 
zeros into the high-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 

Operation: 

T: s 4-0 lisa 

temp 4- 0 s II GPR[rt] 31 s 
GPR[rd] 4- (temp 31 ) 32 II temp 


Exceptions: 

None 
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SRLV Shift Right Logical Variable SRLV 


31 26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 
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rt 

rd 
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SRLV 

000110 

6 
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5 
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Format: 

SRLV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order five bits of general register rs, inserting zeros 
into the high-order bits. 

The result is placed in register rd. 

The operand must be a valid sign-extended, 32-bit value. 

Operation: 


T: s GPR[rs] 4 .. 0 

temp <- 0 s II GPR[rt] 31 ,_ s 
GPR[rd] <- (temp 31 ) 32 II temp 


Exceptions: 

None 
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SUB Subtract SUB 


31 26 25 21 20 16 15 11 10 6 5 0 
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rt 

rd 
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00000 
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6 
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5 

5 

5 

6 


Format: 

SUB rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 
register rd. The operands must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the SUBU instruction 
is that SUBU never traps on overflow. 

An integer overflow exception takes place if the carries out of bits 30 
and 31 differ (2’s complement overflow). The destination register rd is not 
modified when an integer overflow exception occurs. 

Operation: 

T: temp <- GPR[rs] - GPR[rt] 

GPR[rd] <- (temp 31 ) 32 II temp 31 0 


Exceptions: 

Integer overflow exception 
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SUBU Subtract Unsigned SUBU 


31 26 25 21 20 16 15 11 10 65 0 
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rt 

rd 

0 

SUBU 

000000 




00000 

100011 


6 5 5 5 5 6 


Format: 

SUBU rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. 

The result is placed into general register rd. 

The operands must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the SUB instruction 
is that SUBU never traps on overflow. No integer overflow exception occurs 
under any circumstances. 

Operation: 

T: temp GPR[rs] - GPR[rt] 

GPR[rd] <- (temp 31 ) 32 II temp 31 0 


Exceptions: 

None 
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sw Store Word SW 


31 26 

25 21 

20 16 

15 0 

SW 

base 

rt 

offset 

101011 




6 

5 

5 

16 


Format: 

SW rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register rt 
are stored at the memory location specified by the effective address. 

If either of the two least-significant bits of the effective address are non- 
zero, an address error exception occurs. 

Operation: 


T: vAddr <- ((offset 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) AddressTranslation (vAddr, DATA) 
pAddr <— pAddrpsizE-i..3 H (pAddr 20 xor (ReverseEndian II 0 2 ) 
byte <- vAddr 2 0 xor (BigEndianCPU II 0 2 ) 
data <- GPR[rt] 63 . 8 * b y te II 0 8 * byte 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 


Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SWCz Store Word From Coprocessor SWCz 


31 26 

25 21 

20 16 

15 

0 


SWCz 

1 1 1 0 x x* 

base 

rt 

offset 

6 

5 

5 

16 



Format: 

SWCz rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. Coprocessor unit z sources a word, 
which the processor writes to the addressed memory location. 

The data to be stored is defined by individual coprocessor 
specifications. 

This instruction is not valid for use with CPO. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

Execution of the instruction referencing coprocessor 3 causes a 
reserved instruction exception, not a coprocessor unusable exception. 

Operation: 


T: vAddr «- ((offset 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddr PS | ZE . 13 II (pAddr 2 0 xor (ReverseEndian II 0 2 ) 
byte <- vAddr 2 0 xor (BigEndianCPU II 0 2 ) 
data <r- COPzSW (byte.rt) 

StoreMemory (uncached, WORD, data, pAddr, vAddr DATA) 


Note: *See the table in this section under "Opcode Bit Encoding." 
Also see “CPU Instruction Opcode Bit Encoding” at the end of Appendix A. 

Exceptions: 

TLB refill exception 

TLB invalid exception 

TLB modification exception 

Bus error exception 

Address error exception 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 


Opcode Bit Encoding: 


SWCz Bit 

#31 

30 

29 

28 

27 

26 

0 

SWC1 

1 

1 

1 

0 

0 

1 


Bit 
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27 

26 

0 
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0 1 
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7 


SW opcode Coprocessor Unit Number 
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SWL Store Word Left SWL 


31 26 
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15 0 

SWL 

base 

rt 

offset 

101010 




6 

5 

5 

16 


Format: 

SWL rt, offset(base) 

Description: 

This instruction can be used with the SWR instruction to store the 
contents of a register into four consecutive bytes of memory, when the 
bytes cross a word boundary. SWL stores the left portion of the register 
into the appropriate part of the high-order word of memory; SWR stores the 
right portion of the register into the appropriate part of the low-order word. 

The SWL instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which may 
specify an arbitrary byte. It alters only the word in memory which contains 
that byte. From one to four bytes will be stored, depending on the starting 
byte specified. 

Conceptually, it starts at the most-significant byte of the register and 
copies it to the specified byte in memory; then it copies bytes from register 
to memory until it reaches the low-order byte of the word in memory. 

No address exceptions due to alignment are possible. 



Operation: 


T: vAddr <- ((offset 15 ) 48 II offset 15 .. 0 ) + GPRfbase] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <r- pAddr PS | ZE _ 13 II (pAddr 20 xor ReverseEndian 3 ) 
If BigEndianMem = 0 then 
pAddr <- pAddr 31 2 II 0 2 
endif 

byte<-vAddr-| 0 xor BigEndianCPU 2 
if (vAddr 2 xor BigEndianCPU) = 0 then 

data <- 0 32 II O 24 - 8 *^ II GPR[rt] 31 24 - 8 * b yte 
else 

data <- o24-8*byte || GPR[rt] 31 .. 24 - 8 *byte H 0 32 
endif 

StoreMemory(uncached, byte, data, pAddr, vAddr, DATA) 
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Given a doubleword in a register and a doubleword in memory, the 
operation of SWL is as follows: 
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4 
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1 
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E 

0 

I 
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LEM Little-endian memory (BigEndianMem = 0) 

BEM BigEndianMem = 1 

Type AccessType (see Table 2.1 on page 2-3) sent to memory 
Offset pAddr 2 ,.o sent to memory 


Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SWR Store Word Right SWR 
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base 

rt 

offset 

101110 




6 

5 

5 

16 


Format: 

SWR rt, offset(base) 

Description: 

This instruction can be used with the SWL instruction to store the 
contents of a register into four consecutive bytes of memory, when the 
bytes cross a boundary between two words. SWR stores the right portion 
of the register into the appropriate part of the low-order word; SWL stores 
the left portion of the register into the appropriate part of the low-order 
word of memory. 

The SWR instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which may 
specify an arbitrary byte. It alters only the word in memory which contains 
that byte. From one to four bytes will be stored, depending on the starting 
byte specified. 

Conceptually, it starts at the least-significant (rightmost) byte of the 
register and copies it to the specified byte in memory; then copies bytes 
from register to memory until it reaches the high-order byte of the word in 
memory. 

No address exceptions due to alignment are possible. 
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Operation: 


T: vAddr <- ((offset^) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrp S |ZE-i ..3 II (pAddr 2 . 0 xor Reverse Endian 3 ) 

If BigEndianMem = 0 then 
pAddr <- pAddr 312 II 0 2 
endif 

byte vAdd^ _ 0 xor BigEndianCPU 2 
if (vAddr 2 xor BigEndianCPU) = 0 then 
data <- 0 32 II GPR[rt] 31 . 8 * byte 0 II 0 8 * b y‘ e 
else 

data <- GPR[rt] 31 . 8 * byte0 II 0 a ' b v te II 0 32 
endif 

StoreMemory(uncached, WORD-byte, data, pAddr, vAddr, DATA) 


Given a doubleword in a register and a doubleword in memory, the 
operation of SWR is as follows: 


SWR 


Register 

A 

B 

C 

D 



G 

H 




Memory 

1 

J 

K 

L 

M 

N 

0 

P 





vAddr 2 ..o 

BigEndianCPU = 0 

BigEndianCPU = 

1 

destination 


offset 

destination 

type 

offset 


LEM BEM 

0 

1 J K L E F G H 


0 4 

HJ KLMNOP 


7 0 

1 

1 J K L F G H P 


1 4 

GHKLMNOP 


6 0 

2 

1 J K L G H 0 P 

1 

2 4 

FGHLMNOP 


5 0 

3 

1 J K L H N 0 P 

0 

3 4 

EF GHMNOP 

H 

4 0 

4 

EFGHMNOP 


4 0 

1 J K L H N 0 P 

0 

3 4 

5 

F GHLMNOP 


5 0 

1 J K L G H 0 P 

1 

2 4 

6 

GHKLMNOP 

1 

6 0 

1 J K L F G H P 


1 4 

7 

HJ KLMNOP 

0 

7 0 

1 J K L E F G H 


0 4 


LEM Little-endian memory (BigEndianMem = 0) 

BEM BigEndianMem = 1 

Type AccessType (see Table 2. 1 on page 2-3) sent to memory 
Offset pAddr 2 . .o sent to memory 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SYNC Synchronize SYNC 


31 26 25 


6 5 0 


SPECIAL 

000000 


0 

0000 0000 0000 0000 0000 


SYNC 

001111 


6 


20 


6 


Format: 

SYNC 

Description: 

The SYNC instruction ensures that any loads and stores fetched prior 
to the present instruction are completed before any loads or stores after 
this instruction are allowed to start. Use of the SYNC instruction to 
serialize certain memoiy references may be required in a multiprocessor 
environment for proper synchronization. For example: 


Processor A 

Processor B 

SW R1 , DATA 

LI R2, 1 

SYNC 

SW R2, FLAG 

1: LW R2, FLAG 

BEQ R2, R0, IB 

NOP 

SYNC 

LW R1 , DATA 


The SYNC in processor A prevents DATA being written after FLAG, 
which could cause processor B to read stale data. The SYNC in processor 
B prevents DATA from being read before FLAG, which could likewise result 
in reading stale data. For processors which only execute loads and stores 
in order, with respect to shared memoiy, this instruction is a NOP. 

LL and SC instructions implicitly perform a SYNC. 

This instruction is allowed in User mode. 

Operation: 


T: SyncOperationQ 


Exceptions: 

None 
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SYSCALL System Call SYSCALL 


31 26 25 


6 5 0 


SPECIAL 

000000 


Code 


SYSCALL 
0 0 1 1 00 


6 


20 


6 


Format: 

SYSCALL 

Description: 

A system call exception occurs, Immediately and unconditionally 
transferring control to the exception handler. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 


T : SystemCallException 


Exceptions: 

System Call exception 
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TEQ Trap If Equal TEQ 


31 26 25 21 20 16 15 6 5 0 


SPECIAL 

rs 

rt 

code 

TEQ 

000000 




110100 

6 

5 

5 

10 

6 


Format: 

TEQ rs, rt 

Description: 

The contents of general register rt are compared to general register rs. 
If the contents of general register rs are equal to the contents of general 
register rt, a trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 


T: if GPR[rs] = GPR[rt] then 
TrapException 
endif 


Exceptions: 

Trap exception 
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Format: 

TEQI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. If the contents of general register rs are equal to the 
sign-extended immediate, a trap exception occurs. 

Operation: 

T: if GPR[rs] = (immediate ^) 48 II immediate 15 0 then 

TrapException 
endif 


Exceptions: 

Trap exception 
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TGE Trap If Greater Than Or Equal TGE 


31 26 

25 21 

20 16 

15 6 

5 0 

SPECIAL 

rs 

rt 

code 

TGE 

000000 




110000 

6 

5 

5 

10 

6 


Format: 

TGE rs, rt 

Description: 

The contents of general register rt are compared to the contents of 
general register rs. Considering both quantities as signed integers, if the 
contents of general register rs are greater than or equal to the contents of 
general register rt, a trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 


T: if GPR[rs] > GPR[rt]then 

TrapException 

endif 


Exceptions: 

Trap exception 
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TGEI Trap If Greater Than Or Equal Immediate 


31 26 

25 21 

20 16 

15 0 

REGIMM 

rs 

TGEI 

immediate 

000001 


01000 


6 

5 

5 

16 


Format: 

TGEI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. Considering both quantities as signed integers, if 
the contents of general register rs are greater than or equal to the sign- 
extended immediate, a trap exception occurs. 

Operation: 


T : if GPR[rs] > (immediate 15 ) 48 II immediate-^. 0 then 
TrapException 
endif 


Exceptions: 

Trap exception 
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TGEIU 


Trap If Greater Than Or Equal 
Immediate Unsigned 


TGEIU 


31 26 25 21 20 16 15 0 


REGIMM 

rs 

TGEIU 

immediate 

000001 


01001 


6 

5 

5 

16 


Format: 

TGEIU rs, Immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. Considering both quantities as unsigned integers, if 
the contents of general register rs are greater than or equal to the sign- 
extended immediate, a trap exception occurs. 

Operation: 


T: if (0 II GPR[rs]) > (0 II (immediate 15 ) 48 II immediate-, 50 ) then 
Trap Exception 
end if 


Exceptions: 

Trap exception 
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TGEU Trap If Greater Than Or Equal Unsigned TGEU 


31 26 25 21 20 16 15 6 5 0 


SPECIAL 

rs 

rt 

code 

TGEU 

000000 




110001 

6 

5 

5 

10 

6 


Format: 

TGEU rs, rt 

Description: 

The contents of general register rt are compared to the contents of 
general register rs. Considering both quantities as unsigned integers, if 
the contents of general register rs are greater than or equal to the contents 
of general register rt, a trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 


T: 

if (0 II GPR[rs]) > (0 II GPR[rt]) then 


TrapException 


endif 


Exceptions: 

Trap exception 
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TLBP Probe TLB For Matching Entry TLBP 


31 26 

25 

24 6 

5 0 

COPO 

CO 

0 

TLBP 

010000 

1 

000 0000 0000 0000 0000 

001 000 

6 

1 

19 

6 


Format: 

TLBP 

Description: 

The Index register is loaded with the address of the TLB entry whose 
contents match the contents of the EntryHi register. If no TLB entry 
matches, the high-order bit of the Index register is set. 

The architecture does not specify the operation of memory references 
associated with the instruction immediately after a TLBP instruction, nor 
is the operation specified if more than one TLB entry matches. 

Operation: 


T: Index*- 1 II 0 31 

for i in O..TLBEntries-1 

if (TLB[i] 167 14 i and not (0 15 II TLB[i] 2 i 6 .. 205 )) 

= EntryHi 39 . , 13 ) and not (0 15 II TLB[i] 216 .. 205 )) and 
(TLB[i]i 4 o or (TLB[i] 135--128 = EntryHfy.o)) then 
Index*- 0 26 II i 5i 0 
endif 
endfor 


Exceptions: 

Coprocessor unusable exception 
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TLBR Read Indexed TLB Entry TLBR 


CO 

Osl 

CO 

25 

24 6 

5 0 

COPO 

CO 

0 

TLBR 

010000 

1 

000 0000 0000 0000 0000 

000001 

6 

1 

19 

6 


Format: 

TLBR 

Description: 

The Gbit (which controls ASID matching) read from the TLB is written 
into both of the EntryLoO and EntryLol registers. 

The EntryHi and EntryLo registers are loaded with the contents of the 
TLB entry pointed at by the contents of the TLB Index register. The 
operation is invalid (and the results are unspecified) if the contents of the 
TLB Index register are greater than the number of TLB entries in the 
processor. 

Operation: 


T: PageMask<- TLB[lndex 5 , ,o) 255 ..i 92 

EntryHi <- TLB[lndex 5 0 ]i9i.. 128 and not TLB[lndex 5 , 0 ) 255.. 192 
EntryLol <— TLB[lndexg „o)i 27..65 ^ TLB[lndexg ,, 0)140 
EntryLoO <- TLB[lndex 50 ] 631 II TLB[lndex s 0 ] 140 


Exceptions: 

Coprocessor unusable exception 
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TLBWI Write Indexed TLB Entry TLBWI 


31 26 

25 

24 

6 

5 0 

COPO 

0 1 0000 

CO 

1 

0 

000 0000 0000 0000 0000 

TLBWI 
00001 0 

6 

1 

19 


6 


Format: 

TLBWI 

Description: 

The G bit of the TLB is written with the logical AND of the G bits in the 
EntryLoO and EntryLol registers. 

The TLB entry pointed at by the contents of the TLB Index register is 
loaded with the contents of the EntryHi and EntryLo registers. 

The operation is invalid (and the results are unspecified) if the contents 
of the TLB Index register are greater than the number of TLB entries in the 
processor. 

Operation: 


T: TLBflndexs.o] <- 

PageMask II (EntryHi and not PageMask) II EntryLol II EntryLoO 


Exceptions: 

Coprocessor unusable exception 
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TLBWR Write Random TLB Entry TLBWR 


31 26 25 24 6 5 0 


COPO 

CO 

0 

TLBWR 

010000 

1 

000 0000 0000 0000 0000 

0001 1 0 

6 

1 

19 

6 


Format: 

TLBWR 

Description: 

The G bit of the TLB is written with the logical AND of the G bits in the 
EntryLoO and EntryLol registers. 

The TLB entry pointed at by the contents of the TLB Random register 
is loaded with the contents of the EntryHi and EntryLo registers. 

Operation: 


T: TLB[Random 5 0 ] <- 

PageMask II (EntryHi and not PageMask) II EntryLol II EntryLoO 


Exceptions: 

Coprocessor unusable exception 
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TLT Trap If Less Than TLT 


31 26 25 21 20 16 15 6 5 0 


SPECIAL 

rs 

rt 

code 

TLT 

000000 




110010 

6 

5 

5 

10 



Format: 

TLT rs, rt 

Description: 

The contents of general register rt are compared to general register rs. 
Considering both quantities as signed integers, if the contents of general 
register rs are less than the contents of general register rt, a trap exception 
occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 

T: if GPR[rs] < GPR[rt] then 
TrapException 
endif 


Exceptions: 

Trap exception 
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TLTI Trap If Less Than Immediate TLTI 


31 26 

25 21 

20 16 

15 0 

REGIMM 

rs 

TLTI 

immediate 

000001 


01010 


6 

5 

5 

16 


Format: 

TLTI rs. Immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. Considering both quantities as signed integers, if the 
contents of general register rs are less than the sign-extended immediate, 
a trap exception occurs. 

Operation: 

T: if GPR[rs] < (immediate 15 ) 48 II immediate! 5 0 then 
TrapException 
endif 


Exceptions: 

Trap exception 
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TLTIU Trap If Less Than Immediate Unsigned TLTIU 


31 26 25 21 20 16 15 0 


REGIMM 

rs 

TLTIU 

immediate 

000001 


01011 


6 

5 

5 

16 


Format: 

TLTIU rs, immediate 

Description: 

The 16 -bit immediate is sign-extended and compared to the contents 
of general register rs. Considering both quantities as signed integers, if the 
contents of general register rs are less than the sign-extended immediate, 
a trap exception occurs. 

Operation: 

T: if (0 il GPR[rs]) < (0 II (immediate 15 ) 48 II immediate-^ 0 ) then 
TrapException 
endif 


Exceptions: 

Trap exception 
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TLTU Trap If Less Than Unsigned TLTU 


31 26 

25 21 

20 16 

15 6 

5 0 

SPECIAL 

rs 

>1 

code 

TLTU 

000000 




110011 

6 

5 

5 

10 

6 


Format: 

TLTU rs, rt 

Description: 

The contents of general register rt are compared to general register rs. 
Considering both quantities as unsigned integers, if the contents of general 
register rs are less than the contents of general register rt, a trap exception 
occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 

T: if (0 II GPR[rs]) < (0 II GPR[rt]) then 

TrapException 
endif 


Exceptions: 

Trap exception 
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TNE Trap If Not Equal TNE 


31 26 25 21 20 16 15 6 5 0 


SPECIAL 

rs 

rt 

code 

TNE 

000000 




110110 

6 

5 

5 

10 

6 


Format: 

TNE rs, rt 

Description: 

The contents of general register rt are compared to general register rs. 
If the contents of general register rs are not equal to the contents of general 
register rt, a trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 

T: if GPR[rs] * GPR[rt] then 

Trap Exception 
endif 


Exceptions: 

Trap exception 


A- 146 









CPU Instruction Set Details 


Appendix A 


TNEI Trap If Not Equal Immediate TNEI 


31 26 

25 21 

20 16 

15 0 

REGIMM 

rs 

TNEI 

immediate 

000001 


01110 


6 

5 

5 

16 


Format: 

TNEI rs. Immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents 
of general register rs. If the contents of general register rs are not equal to 
the sign-extended immediate, a trap exception occurs. 

Operation: 

T : if GPR[rs] * (immediate^) 48 II immediate 15 0 then 
TrapException 
endif 


Exceptions: 

Trap exception 
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WAIT wait WAIT 


CO 

CM 

CO 

25 

24 

6 

5 0 

COPO 

01 0000 

CO 

1 

0 

000 0000 0000 0000 0000 

WAIT 

1 00000 


1 

19 


6 


Format: 

WAIT 

Description: 

The WAIT instruction is used to halt the internal pipeline and thus 
reduce the power consumption of the CPU. See Appendix G for more 
details. 

Operation: 


T: 


if SysAD bus is idle then 
StopPipeline 


end if 


Exceptions: 

Coprocessor unusable exception 
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XOR Exclusive Or XOR 


31 26 25 21 20 16 15 11 10 65 0 


SPECIAL 

rs 

rt 

rd 

0 

XOR 

000000 




00000 

100110 

6 

5 

5 

5 

5 

6 


Format: 

XOR rd, rs, rt 


Description: 

The contents of general register rs are combined with the contents of 
general register rt In a bit-wise logical exclusive OR operation. 

The result is placed into general register rd. 

Operation: 


T: GPR[rd] <- GPR[rs] xor GPR[rt] 


Exceptions: 

None 
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XORI Exclusive OR Immediate XORI 


31 26 25 21 20 16 15 0 


XORI 

rs 

rt 

immediate 

001110 




6 

5 

5 

16 


Format: 

XORI it, rs, immediate 

Description: 

The 16-bit immediate is zero-extended and combined with the contents 
of general register rs in a bit-wise logical exclusive OR operation. 

The result is placed into general register rt 

Operation: 


T: GPR[rt] <- GPR[rs] xor (0 48 II immediate) 


Exceptions: 

None 
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CPU Instruction Opcode Bit Encoding 

The remainder of this Appendix presents the opcode bit encoding for 
the CPU instruction set (ISA and extensions), as implemented by the 
R4600/R4700. 

Table A.4 lists the R4600/R4700 Opcode Bit Encoding. 
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SPECIAL REGIMM 


ADDIU SLTI SLTIU 


COPO COP1 COP2 


DADDI DADDIU LDL 


LH LWL 




LWC1 LWC2 




BEQ 

BNE 

BLEZ 

BGTZ 

ANDI 

ORI 

XORI 

LUI 

BEQL 

BN EL 

BLEZL 

BGTZL 



Key to Table: 

* Operation codes marked with an asterisk cause reserved instruction exceptions in all current 
implementations and are reserved for future versions of the architecture, 
g Operation codes marked with a gamma cause a reserved instruction exception. They are 
reserved for future versions of the architecture, 
d Operation codes marked with a delta are valid only for R4600 processors with CPO enabled, 
and cause a reserved instruction exception on other processors, 
f Operation codes marked with a phi are invalid but do not cause reserved instruction 
exceptions in R4600 implementations. 
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FPU Instruction Set 

Appendix B 

Details 



Introduction 

This appendix provides a detailed description of each floating-point 
unit (FPU) instruction (refer to Appendix A for a detailed description of the 
CPU instructions). The instructions are listed alphabetically, and any 
exceptions that may occur due to the execution of each instruction are 
listed after the description of each instruction. Descriptions of the 
immediate causes and the manner of handling exceptions are omitted from 
the instruction descriptions in this appendix (refer to Chapter 7 for 
detailed descriptions of floating-point exceptions and handling). 

Figure B.3 on page B-45 lists the entire bit encoding for the constant 
fields of the floating-point instruction set; the bit encoding for each 
instruction is included with that individual instruction. 

Instruction Formats 

There are three basic instruction format types: 

• I-Type, or Immediate instructions, which include load and store oper- 
ations 

• M-Type, or Move instructions 

• R-Type, or Register instructions, which include the two- and three- 
register floating-point operations. 

The instruction description subsections that follow show how these 
three basic instruction formats are used by: 

• Load and store instructions 

• Move instructions 

• Floating-Point computational instructions 

• Floating-Point branch instructions 

Floating-point instructions are mapped onto the MIPS coprocessor 
instructions, defining coprocessor unit number one (CPI) as the floating- 
point unit. 

Each operation is valid only for certain formats. Implementations may 
support some of these formats and operations through emulation, but they 
only need to support combinations that are valid (marked V in Table B. 1). 

Combinations marked R in Table B. 1 are not currently specified by this 
architecture, and cause an unimplemented instruction trap. They will be 
available for future extensions to the architecture. 
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Operation 

Source Format 

Single 

Double 

Word 

Longword 

ADD 

V 

V 

R 

R 

SUB 

V 

V 

R 

R 

MUL 

V 

V 

R 

R 

DIV 

V 

V 

R 

R 

SQRT 

V 

V 

R 

R 

ABS 

V 

V 

R 

R 

MOV 

V 

V 



NEG 

V 

V 

R 

R 

TRUNC.L 

V 

V 



ROUND. L 

V 

V 



CEIL.L 

V 

V 



FLOOR.L 

V 

V 



TRUNC.W 

V 

V 



ROUND. W 

V 

V 



CEIL.W 

V 

V 




V 

V 



CVT.S 


V 

V 

V 

CVT.D 

V 


V 

V 

CVT.W 

V 

V 



CVT.L 

V 

V 



C 

V 

V 

R 

R 


Table B.l Valid FPU Instruction Formats 


The coprocessor branch on condition true/false instructions can be 
used to logically negate any predicate. Thus, the 32 possible conditions 
require only 16 distinct comparisons, as shown in Table B.2 below. 
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Condition 

Relations 

Invalid 
Operation 
Exception If 
Unordered 

Mnemonic 



Less 

Than 

Equal 

Unordered 

True 

False 

F 

T 

0 

F 

F 

F 

F 

No 

UN 

OR 

1 

F 

F 

F 

T 

No 

EQ 

NEQ 

2 

F 

F 

T 

F 

No 

UEQ 

OGL 

3 

F 

F 

T 

T 

No 

OLT 

UGE 

4 

F 

T 

F 

F 

No 

ULT 

OGE 

5 

F 

T 

F 

T 

No 

OLE 

UGT 

6 

F 

T 

T 

F 

No 

ULE 

OGT 

7 

F 

T 

T 

T 

No 

SF 

ST 

8 

F 

F 

F 

F 

Yes 

NGLE 

GLE 

9 

F 

F 

F 

T 

Yes 

SEQ 

SNE 

10 

F 

F 

T 

F 

Yes 

NGL 

GL 

11 

F 

F 

T 

T 

Yes 

LT 

EQHI 


F 

T 

F 

F 

Yes 

NGE 


13 

F 

T 

F 

T 

Yes 

LE 

NLE 

14 

F 

T 

T 

F 

Yes 

NGT 

GT 

15 

F 

T 

T 

T 

Yes 


Table B.2 Logical Negation of Predicates by Condition True/False 


Floating-Point Loads, Stores, and Moves 

All movement of data between the floating-point coprocessor and 
memory is accomplished by coprocessor load and store operations, which 
reference the floating-point coprocessor General Purpose registers. These 
operations are unformatted; no format conversions are performed and, 
therefore, no floating-point exceptions can occur due to these operations. 

Data may also be directly moved between the floating-point 
coprocessor and the processor by move to coprocessor and move from 
coprocessor instructions. Like the floating-point load and store operations, 
move to/from operations perform no format conversions and never cause 
floating-point exceptions. 

An additional pair of coprocessor registers are available, called 
Floating-Point Control registers for which the only data movement 
operations supported are moves to and from processor General Purpose 
registers. 
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Floating-Point Operations 

The floating-point unit operation set includes: 

• floating-point add 

• floating-point subtract 

• floating-point multiply 

• floating-point divide 

• floating-point square root 

• convert between fixed-point and floating-point formats 

• convert between floating-point formats 

• floating-point compare 

These operations satisfy the requirements of IEEE Standard 754 
requirements for accuracy. Specifically, these operations obtain a result 
which is identical to an infinite-precision result rounded to the specified 
format, using the current rounding mode. 

Instructions must specify the format of their operands. Except for 
conversion functions, mixed-format operations are not provided. 

Instruction Notation Conventions 

In this appendix, all variable subfields in an instruction format (such 
as j's, ft. immediate , and so on) are shown in lower-case. The instruction 
name (such as ADD, SUB, and so on) is shown in upper-case. 

For the sake of clarify, we sometimes use an alias for available subfield 
in the formats of specific instructions. For example, we use rs = base in 
the format for load and store instructions. Such an alias is always lower 
case, since it refers to a variable subfield. 

In some instructions, the instruction subfields op and. Junction can 
have constant 6-bit values. When reference is made to these instructions, 
upper-case mnemonics are used. For instance, in the floating-point ADD 
instruction we use op = COP1 and function = FADD. In other cases, a 
single field has both fixed and variable subfields, so the name contains 
both upper and lower case characters. Bit encoding for mnemonics are 
shown in Figure B.3 at the end of this appendix, and are also included with 
each individual instruction. 

In the instruction description examples that follow, the Operation 
section describes the operation performed by each instruction using a 
high-level language notation. 

Instruction Notation Examples 

The following examples illustrate the application of some of the 
instruction notation conventions: 


Example #1 : 

GPRfit] • immediate II 0* 6 ; 

Sixteen zero bits are concatenated with an immediate 
value (typically 16 bits), and the 32-bit string (with the lower 
1 6 bits set to zero) is assigned to General Purpose Register rt. 


Example #2: 

(immediate is )- 6 llimmediate 1 5i0 

Bit 1 5 (the sign bit) of an immediate value is extended for 
1 6 bit positions, and the result is concatenated with bits 1 5 
through 0 of the immediate value to form a 32-bit sign 
extended value. 
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Load and Store Instructions 

In the R4600 implementation, the instruction immediately following a 
load may use the contents of the register being loaded. In such cases, the 
hardware interlocks, requiring additional real cycles, so scheduling load 
delay slots is still desirable, although not required for functional code. 

The behavior of the load store instructions is dependent on the width 
of the FGRs. 

• When the FR bit in the Status register equals zero, the Floating-Point 
General registers (FGRs) are 32-bits wide. 

• When the FR bit in the Status register equals one, the Floating-Point 
General registers (FGRs) are 64-bits wide. 

In the load and store operation descriptions, the functions listed in 
Table B.3 are used to summarize the handling of virtual addresses and 
physical memory. 


Function 

Meaning 

AddressTranslation 

Uses the TLB to find the physical address given the virtual 
address. The function fails and an exception is taken if 
the required translation is not present in the TLB. 

LoadMemory 

Uses the cache and main memory to find the contents of 
the word containing the specified physical address. The 
low-order two bits of the address and the Access Type field 
indicates which of each of the four bytes within the data 
word need to be returned. If the cache is enabled for this 
access, the entire word is returned and loaded into the 
cache. 

StoreMemory 

Uses the cache, write buffer, and main memory to store 
the word or part of word specified as data in the word con- 
taining the specified physical address. The low-order two 
bits of the address and the Access Type field indicates 
which of each of the four bytes within the data word 
should be stored. 


Table B.3 Load and Store Common Functions 


Figure B. 1 shows the I-Type instruction format used by load and store 
operations. 


I -Type (Immediate) 

31 26 25 21 20 16 15 0 


op 

base 

tt 

offset I 

6 

5 

5 

16 


op is a 6-bit operation code 

base is the 5-bit base register specifier 

ft is a 5-bit source (for stores) or destination (for loads) FPA register specifier 

offset is the 16-bit signed immediate offset 

Figure B. 1 Load and Store Instruction Format 
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All coprocessor loads and stores reference aligned-word data items. 
Thus, for word loads and stores, the access type field is always WORD, and 
the low-order two bits of the address must always be zero. 

For doubleword loads and stores, the access type field is always 
DOUBLEWORD, and the low-order three bits of the address must always 
be zero. 

Regardless of byte-numbering order (endianness), the address 
specifies that byte which has the smallest byte-address in the addressed 
field. For a big-endian machine, this is the leftmost byte; for a little-endian 
machine, this is the rightmost byte. 

Computational Instructions 

Computational instructions include all of the arithmetic floating-point 
operations performed by the FPU. 

Figure B.2 shows the R-Type instruction format used for 
computational operations. 


R-Type (Register) 

31 26 25 21 20 16 15 11 10 6 5 0 


COP1 

fmt 

ft 

fs 

fd 

function | 

6 

5 

5 

5 

5 

6 


COP1 

is a 6-bit operation code 

fmt 

is a 5-bit format specifier 

fs 

is a 5-bit source 1 register 

ft 

is a 5-bit source2 register 

fd 

is a 5-bit destination register 

function 

is a 6-bit function field 


” Figure B.2 Computational Instruction Format ™ 

The Junction field indicates the floating-point operation to be 
performed. 

Each floating-point instruction can be applied to a number of operand 
formats. The operand format for an instruction is specified by the 5-bit 
format field; decoding for this field is shown in Table B.4. 


Code 

Mnemonic 

Size 

Format 

16 

S 

single 

Binary floating-point 

17 

D 

double 

Binary floating-point 

18 

Reserved 

19 

Reserved 

20 

W 

single 

32-bit binary fixed-point 


L 

longword 

64-bit binaiy fixed-point 

22-31 

Reserved 


Table B.4 Format Field Decoding 
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Table B.5 lists all floating-point instructions. 


Code 
(S: 0) 

Mnemonic 

Operation 

0 

ADD 

Add 

1 

SUB 

Subtract 

2 

MUL 

Multiply 

3 

DIV 

Divide 

4 

SQRT 

Square root 

5 

ABS 

Absolute value 

6 

MOV 

Move 

7 

NEG 

Negate 


ROUND. L 

Convert to single fixed-point, rounded to nearest/even 

9 


Convert to single fixed-point, rounded toward zero 

10 

CEIL.L 

Convert to single fixed-point, rounded to 

11 

FLOOR. L 

Convert to single fixed-point, rounded to -<» 

12 

ROUND .W 

Convert to single fixed-point, rounded to nearest/even 

13 

TRUNC.W 

Convert to single fixed-point, rounded toward zero 

14 

CEIL.W 

Convert to single fixed-point, rounded to + 

15 

FLOOR.W 

Convert to single fixed-point, rounded to -«> 

16-31 

- 

Reserved 

32 

CVT.S 

Convert to single floating-point 

33 

CVT.D 

Convert to double floating-point 

34 

- 

Reserved 

35 

- 

Reserved 

36 

CVT.W 

Convert to 32-bit binary fixed-point 

37 


Convert to 64-bit binary fixed-point 

38-47 

- 

Reserved 

48-63 

c 

Floating-point compare 


Table B.5 Floating-Point Instructions and Operations 


In the following pages, the notation FGR refers to the 32 General 
Purpose registers FGRO through FGR31 of the FPU, and FPR refers to the 
floating-point registers of the FPU. 

• When the FR bit in the Status register (SR(26)) equals zero, only the 
even floating-point registers are valid and the 32 General Purpose reg- 
isters of the FPU are 32-bits wide. 

• When the FR bit in the Status register (SR(26)) equals one, both odd 
and even floating-point registers may be used and the 32 General Pur- 
pose registers of the FPU are 64-bits wide. 

The following routines are used in the description of the floating-point 
operations to retrieve the value of an FPR or to change the value of an FGR 
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FR = 0 

value <- ValueFPR(fpr, fmt) 
case fmt of 
S, W: 

if FGRq = 0 
value <- FGR[fpr] 
else 

value <- FGRffpr - 1] 

endif 

D: 

/* undefined for fpr not even */ 

value <- FGR[fpr] 

end 

StoreFPR(fpr, fmt, value): 
case fmt of 
S, W: 

if FGRq = 0 

FGR[fpr] <- FGR[fpr] 63 32 I I value 
else 

FGRlfpr - 1] value I I FGR[fpr - 1] 31 0 

endif 

D: 

/* undefined for fpr not even */ 

FGRffpr] <- value 
end 


FR = 1 

value <- ValueFPRffpr, fmt) 
case fmt of 
S: 

value <- FGR[fpr] 3 i 0 
D, L: 

value <- FGRffpr] 

W: 

value <- FGRffpr] 
end 

StoreFPRffpr, fmt, value): 
case fmt of 
S, W: 

FGRffpr] <- undefined 32 I I value 
D, L: 

FGRffpr] <- value 
end 
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ABS.fmt 


Floating-Point 
Absolute Value 


ABS.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 


COP1 

01 0001 

fmt 

0 

00000 

fs 

fd 

ABS 

0001 01 

6 

5 

5 

5 

5 

6 


Format: 

ABS.fmt fd, fs 

Description: 

The contents of the FPU register specified by fs are interpreted in the 
specified format and the arithmetic absolute value is taken. The result is 
placed in the floating-point register specified by fd. 

The absolute value operation is arithmetic; a NaN operand signals 
invalid operation. 

This instruction is valid only for single- and double-precision floating- 
point formats. The operation is not defined if bit 0 of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Operation: 


T : StoreFPR(fd, fmt, AbsoluteValue(ValueFPR(fs, fmt))) 


Exceptions: 

Coprocessor unusable exception 
Coprocessor exception trap 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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ADD.fmt Floating-Point Add ADD.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 


COP1 

01 0001 

fmt 

ft 

fs 

fd 

ADD 

000000 

6 

5 

5 

5 

5 

6 


Format: 

ADD.fmt fd, fs, ft 

Description: 

The contents of the FPU registers specified by fs and ft are interpreted 
in the specified format and arithmetically added. The result is rounded as 
if calculated to infinite precision and then rounded to the specified format 
(fmt), according to the current rounding mode. The result is placed in the 
floating-point register (FPR) specified by/d. 

This instruction is valid only for single- and double-precision floating- 
point formats. The operation is not defined if bit 0 of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Operation: 


T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) + ValueFPR(ft, fmt)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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BC1F 


Branch On FPA False 
(Coprocessor 1) 


BC1F 


31 26 25 21 20 1615 0 


COP1 

BC 

BCF 

offset 

010001 

0 1000 

00000 


6 

5 

5 

16 


Format: 

BC 1 F offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16 -bit offset, shifted left two bits 
and sign-extended. If the result of the last floating-point compare is false, 
the program branches to the target address, with a delay of one 
instruction. 

Operation: 


T-1: 

condition not COC[1 ] 

T: 

target <- (offset 15 ) 46 II offset II 0 2 

T+1: 

if condition then 


PC PC + target 


endif 


Exceptions: 

Coprocessor unusable exception 
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BC1FL 


Branch On FPU False Likely 
(Coprocessor 1) 


BC1FL 


31 26 25 21 20 16 15 0 


COP1 

BC 

BCFL 

offset 

010001 

0 1000 

000 1 0 


6 

5 

5 

16 


Format: 

BC1FL offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. 

If the result of the last floating-point compare is false, the program 
branches to the target address, with a delay of one instruction. If the 
conditional branch is not taken, the instruction in the branch delay slot is 
nullified. 


Operation: 


T-1: 

condition <- not COC[1 ] 

T: 

target <- (offset 15 ) 46 II offset II 0 2 

T+1 : 

if condition then 


PC <- PC + target 


else 


NullifyCurrentlnstruction 


endif 


Exceptions: 

Coprocessor unusable exception 
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BC1T 


Branch On FPU True 
(Coprocessor 1) 


BC1T 


31 26 25 21 20 16 15 0 


COP1 

BC 

BCT 

offset 

010001 

01000 

0000 1 


6 

5 

5 

16 


Format: 

BC1T offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. If the result of the last floating-point compare is true, 
the program branches to the target address, with a delay of one 
instruction. 


Operation: 


T-1: 

condition <- COC[1] 

T: 

target <— (offset 15 ) 46 II offset II 0 2 

T+1: 

if condition then 


PC PC + target 


endif 


Exceptions: 

Coprocessor unusable exception 
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BC1TL 


Branch On FPU True Likely 
(Coprocessor 1) 


BC1TL 


31 26 25 21 20 1615 0 


COP1 

BC 

BCTL 

offset 

010001 

01 000 

0001 1 


6 

5 

5 

16 


Format: 

BC1TL offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two bits 
and sign-extended. 

If the result of the last floating-point compare is true, the program 
branches to the target address, with a delay of one instruction. If the 
conditional branch is not taken, the instruction in the branch delay slot is 
nullified. 

Operation: 


T-1 : condition <- COC[1 ] 

T: target <- (offset 15 ) 46 II offset II 0 2 

T+1 : if condition then 

PC <- PC + target 

else 

NullifyCurrentlnst ruction 
endif 


Exceptions: 

Coprocessor unusable exception 
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C.cond.fmt "TompTre"* C.cond.fmt 


31 26 25 21 20 16 15 11 10 6 5 4 3 0 


COP1 

01 0001 

fmt 

ft 

fs 

0 

00000 

FC* 


6 

5 

5 

5 

5 

2 

4 


Format: 

C.cond.fmt fs, ft 

Description: 

The contents of the floating-point registers specified by fs and ft are 
interpreted in the specified format and arithmetically compared. 

A result is determined based on the comparison and the conditions 
specified in the instruction. If one of the values is a Not a Number (NaN), 
and the high-order bit of the condition field is set, an invalid operation 
exception is taken. After a one-instruction delay, the condition is available 
for testing with branch on floating-point coprocessor condition 
instructions. 

Comparisons are exact and can neither overflow nor underflow. Four 
mutually-exclusive relations are possible as results: less than, equal, 
greater than, and unordered. The last case arises when one or both of the 
operands are NaN; eveiy NaN compares unordered with everything, 
including itself. 

Comparisons ignore the sign of zero, so +0 = -0. 

This instruction is valid only for single- and double-precision floating- 
point formats. The operation is not defined if bit 0 of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Note: *See “FPU Instruction Opcode Bit Encoding” at the end of 
Appendix B. 
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Operation: 


T: if NaN(ValueFPR(fs, fmt)) or NaN(ValueFPR(ft, fmt)) then 

less 4- false 
equal 4- false 
unordered 4- true 
if cond 3 then 

signal InvalidOperationException 

endif 

0|30 

less 4- ValueFPR(fs, fmt) < ValueFPR(ft, fmt) 
equal 4- ValueFPR(fs, fmt) = ValueFPR(ft, fmt) 
unordered 4 - false 

endif 

condition 4- (cond 2 and less) or (condi and equal) or 
(cond 0 an d unordered) 

FCR[31] 2 3 4- condition 
COC[1]4- condition 


Exceptions: 

Coprocessor unusable 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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CElL.L.fmt cimn^toLong CElL.L.fmt 

Fixed-Point Format 


31 26 25 21 20 16 15 11 10 6 5 0 


COP1 

01 0001 

fmt 

0 

00000 

fs 

fd 

CEIL.L 

001010 

6 

5 

5 

5 

5 

6 


Format: 

CEIL.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, Jmt, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by fd. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to +°° (2). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 s3 to 2 s3 - 1, the Invalid operation exception 
is raised. If the Invalid operation is not enabled then no exception is taken 
and 2 63 -l is returned. 

Operation: 


T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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CElL.W.fmt 


Floating-Point 
Ceiling to Single 
Fixed-Point Format 


CElL.W.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 


COP1 

010001 

fmt 

0 

00000 

fs 

fd 

CEIL.W 

001110 

6 

5 

5 

5 

5 

6 


Format: 

CElL.W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, fmt, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by fd. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to +°° (2). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 0 of any 
register specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 - 1, the Invalid operation 
exception is raised. If the Invalid operation is not enabled then no 
exception is taken and 2 31 -1 is returned. 

Operation: 


T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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CFC1 


Move Control Word From FPU 
(Coprocessor 1) 


CFC1 


31 26 

25 21 

20 16 

15 11 

10 0 

COP1 

CF 

rt 

fs 

0 

010001 

0001 0 



000 0000 0000 

6 

5 

5 

5 

11 


Format: 

CFC1 rt, fs 

Description: 

The contents of the FPU control register fs are loaded into general 
register rt. 

This operation is only defined when fs equals 0 or 31. 

The contents of general register rt are undefined for time T of the 
instruction immediately following this load instruction. 

Operation: 

T: temp FCR[fs] 

T+1: GPR[rt] <- (temp 31 ) 32 II temp 


Exceptions: 

Coprocessor unusable exception 
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CTC1 


Move Control Word To FPU 
(Coprocessor 1) 


CTC1 


31 26 

25 21 

20 16 

15 11 

10 0 

COP1 

CT 

rt 

fs 

0 

010001 

00110 



000 0000 0000 

6 

5 

5 

5 

11 


Format: 

CTC1 rt, fs 

Description: 

The contents of general register rt are loaded into FPU control register 
fs. This operation is only defined when fs equals 31. 

Writing to Control Register 31, the floating-point Control/ Status 
register, causes an interrupt or exception if any cause bit and its 
corresponding enable bit are both set. The register will be written before 
the exception occurs. The contents of floating-point control register fs are 
undefined for time T of the instruction immediately following this load 
instruction. 

Operation: 


T: temp<- GPR[rt] 31--0 

T+1: FCR[fs] temp 

COC[1] <- FCR[31] 23 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Division by zero exception 
Inexact exception 
Overflow exception 
Underflow exception 
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CVT.D.fmt 


Floating-Point 
Convert to Double 
Floating-Point Format 


CVT.D.fmt 


31 26 25 21 20 16 15 11 10 65 0 


COP1 

01 0001 

fmt 

0 

00000 

fs 

fd 

CVT.D 

100001 

6 

5 

5 

5 

5 

6 


Format: 

CVT.D.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs is interpreted 
in the specified source format, Jmt, and arithmetically converted to the 
double binary floating-point format. The result is placed in the floating- 
point register specified by fd. 

This instruction is valid only for conversions from single floating-point 
format, 32-bit or 64-bit fixed-point format. 

If the single floating-point or single fixed-point format is specified, the 
operation is exact. The operation is not defined if bit 0 of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Operation: 


T: StoreFPR (fd, D, ConvertFmt(ValueFPR(fs, fmt), fmt, D)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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CVT.l_.fmt ConverHo Long CVT.L.fmt 

Fixed-Point Format 


31 26 25 21 20 16 15 11 10 6 5 0 


COP1 

01 0001 

fmt 

0 

00000 

fs 

fd 

CVT.L 

100101 

6 

5 

5 

5 

5 

6 


Format: 

CVT.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are 
interpreted in the specified source format, font, and arithmetically 
converted to the long fixed-point format. The result is placed in the 
floating-point register specified by fd. 

This instruction is valid only for conversions from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 63 to 2 63 -l, the Invalid operation exception is 
raised. If the Invalid operation is not enabled then no exception is taken 
and 2 s3 - 1 is returned. 

Operation: 


T: StoreFPR (fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 


B - 22 






FPU Instruction Set Details 


Appendix B 


CVT.S.fmt Convert to Single CVT.S.fmt 

Floating-Point Format 


31 26 25 21 

20 16 

15 11 

10 6 

5 0 

COP1 

fmt 

0 

fs 

fd 

CVT.S 

010001 


00000 



100000 

6 

5 

5 

5 

5 

6 


Format: 

CVT.S.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are 
interpreted in the specified source format, font, and arithmetically 
converted to the single binary floating-point format. The result is placed 
in the floating-point register specified by/d. Rounding occurs according to 
the currently specified rounding mode. 

This instruction is valid only for conversions from double floating-point 
format, or from 32-bit or 64-bit fixed-point format. The operation is not 
defined if bit 0 of any register specification is set and the FR bit in the 
Status register equals zero, since the register numbers specify an even-odd 
pair of adjacent coprocessor general registers. When the FR bit in the 
Status register equals one, both even and odd register numbers are valid. 

Operation: 


T: StoreFPR(fd, S, ConvertFmt(ValueFPR(fs, fmt), fmt, S)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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CVT.W.fmt 


Floating-Point 
Convert to 
Fixed-Point Format 


CVT.W.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 


COP1 

fmt 

0 

fs 

fd 


01 0001 


00000 




6 

5 

5 

5 

5 

6 


Format: 

CVT.W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, fmt, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by/d. This instruction is valid only for 
conversion from a single- or double-precision floating-point formats. The 
operation is not defined if bit 0 of any register specification is set and the 
FR bit in the Status register equals zero, since the register numbers specify 
an even-odd pair of adjacent coprocessor general registers. When the FR 
bit in the Status register equals one, both even and odd register numbers 
are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 -1, an Invalid operation 
exception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 -1 is returned. 

Operation: 


T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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DIV.fmt Floating-Point Divide DIV.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 


COP1 

01 0001 

fmt 

ft 

fs 

fd 

DIV 

00001 1 

6 

5 

5 

5 

5 

6 


Format: 

DIV.fmt fd, fs, ft 

Description: 

The contents of the floating-point registers specified by/s and ft are 
interpreted in the specified format and arithmetically divided. The result 
is rounded as if calculated to infinite precision and then rounded to the 
specified format, according to the current rounding mode. The result is 
placed in the floating-point register specified by fd. 

This instruction is valid for only single or double precision floating- 
point formats. 

The operation is not defined if bit 0 of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 


T: StoreFPR (fd, fmt, ValueFPR(fs, fmt)/ValueFPR(ft, fmt)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Division-by-zero exception 
Inexact exception 
Overflow exception 
Underflow exception 
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niWICr*-l Doubleword Move From 
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Format: 

DMFC1 rt, fs 

Description: 

The contents of register/s from the floating-point coprocessor is stored 
into processor register rt. 

The contents of general register rt are undefined for time T of the 
instruction immediately following this load instruction. 

The FR bit in the Status register specifies whether all 32 registers of the 
R4600 are addressable. When FR equals zero, this instruction is not 
defined when the least significant bit of/s is non-zero. When FR is set, fs 
may specify either odd or even registers. 

Operation: 


T: 

if SR 26 = 1 then 


data <- CPR[1 ,fs] 


else 


data<- CPR[1,fs 4 •, II 0] 


endif 

T+1: 

GPR[rt] <- data 


Exceptions: 

Coprocessor unusable exception 
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DMTC1 


Doubleword Move To 
Floating-Point Coprocessor 


DMTC1 


31 26 25 21 20 16 15 11 10 0 


COP1 

DMT 
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000 0000 00 00 
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5 

5 
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Format: 

DMTC1 rt, fs 

Description: 

The contents of general register rt are loaded into coprocessor register 
fs of the CPI. 

The contents of floating-point register fs are undefined for time Tof the 
instruction immediately following this load instruction. 

The FR bit in the Status register specifies whether all 32 registers of the 
R4600 are addressable. When FR equals zero, this instruction is not 
defined when the least significant bit of fs is non-zero. When FR equals 
one, fs may specify either odd or even registers. 


Operation: 


T: 

data <- GPR[rt] 

T+1: 

if SR 26 = 1 then 


CPR[1 , fs] data 


else 


CPR[1 , fs 4 i II 0] <- data 


endif 


Exceptions: 

Coprocessor unusable exception 
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FLOOR.L.fmt ESSES FLOOR.L.fmt 

Fixed-Point Format 
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Format: 

FLOORL.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, Jrnt, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by fd. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to -» (3). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 s3 to 2 s3 - 1, the Invalid operation exception 
is raised. If the Invalid operation is not enabled then no exception is taken 
and 2 63 -l is returned. 

Operation: 


T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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FLOOR.W.fmt FLOOR.W.fmt 

Fixed-Point Format 
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Format: 

FLOOR. W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are 
interpreted in the specified source format, Jmt, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by/d. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to (RM = 3). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 0 of any 
register specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 -1, an Invalid operation 
exception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 -1 is returned. 

Operation: 


T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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LDC1 


Load Doubleword to FPU 
(Coprocessor 1) 


LDC1 


31 26 
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base 

ft 

offset 
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5 
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Format: 

LDC1 ft, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. 

When FR = 0, the contents of the doubleword at the memory location 
specified by the effective address is loaded into registers ft and jf 't+1 of the 
floating-point coprocessor. This instruction is not valid, and is undefined, 
when the least significant bit of ft is non-zero. 

When FR = 1, the contents of the doubleword at the memory location 
specified by the effective address are loaded into the 64-bit register ft of the 
floating point coprocessor. 

The FR bit of the Status register (SR 26 ) specifies whether all 32 registers 
of the R4600 are addressable. If FR equals zero, this instruction is not 
defined when the least significant bit of ft is non-zero. If FR equals one, ft 
may specify either odd or even registers. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

Operation: 


T: vAddr 4 - ((offset^ 48 I I offsetis.o) + GPR[base] 

(pAddr, uncached) 4 - AddressTranslation (vAddr, DATA) 
data 4 - LoadMemoiy (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
if SR 26 = 1 then 
CPR[1, ft] 4 - data 
else 

CPR[1, ft 4 ..i I I 0] 4 - data 
endif 


Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWC1 


Load Word to FPU 
(Coprocessor 1) 


LWC1 


31 26 

25 21 

20 16 

15 0 


base 

ft 

offset 





6 

5 

5 

16 


Format: 

LWC1 ft, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The contents of the 
word at the memory location specified by the effective address is loaded 
into register ft of the floating-point coprocessor. 

The FR bit of the Status register specifies whether all 64-bit Floating- 
Point registers are addressable. If FR equals zero, LWC1 loads either the 
high or low half of the 16 even Floating-Point registers. If FR equals one, 
LWC1 loads the low 32-bits of both even and odd Floating-Point registers. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

Operation: 


T: vAddr <- ((offset]^ 48 I I offset^. o) + GPR[base] 

(pAddr, uncached) <— AddressTranslation (vAddr, DATA) 
pAddr <r- pAddr PSIZE . 13 I I (pAddr 2 ..o xor (ReverseEndian I I 0 2 )) 
mem <- LoadMemoryfuncached, WORD, pAddr, vAddr, DATA) 
byte <- vAddr 2 0 xor (BigEndianCPU I I 0 2 ) 
if SR 26 = 1 then 

CPR[1, ft] <- undefined 32 I I mem 31+8 * byte 8 * byte 
else if fto=0 then 

CPR[1, ft 41 I I 0] <- CPR[1, ft 41 I I 0]g 4 3 2 I I mem3 1+8 * byte 8 « byte 
else 

CPR[1, ft 4 i I I 0] < — mem 31+8 « byte 8 * byte I I CPR[1, ft 4- .i I I 0] 31 0 
endif 


Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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MFC1 


Move From FPU 
(Coprocessor 1) 


MFC1 
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10 0 
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MF 

rt 

fs 

0 
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00000 



000 0000 0000 

6 

5 

5 

5 
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Format: 

MFC1 rt, fs 

Description: 

The contents of register fs from the floating-point coprocessor are 
loaded into processor register rt 

The contents of register rt are undefined for time T of the instruction 
immediately following this load instruction. 

The FR bit of the Status register specifies whether all 32 registers of the 
R4600 are addressable. If FR equals zero, MFC1 loads either the high or 
low half of the 16 even Floating-Point registers. If FR equals one, MFC1 
stores the low 32-bits of both even and odd Floating-Point registers. 

Operation: 


T: ifSR 26 = 1then 

data <- CPR[1 , fs] 
else if fs 0 = 0 then 

data <- CPR[1, fs 4 -1 II 0] 31 0 

else 

data < — CPR[1, fs 4 -j II 0] 33 <>32 

endif 

T+1 : GPR[rt] (data 31 ) 32 II data 


Exceptions: 

Coprocessor unusable exception 
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MOV.fmt Floating-Point Move MOV.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 


COP1 

01 0001 

fmt 

0 
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fs 

fd 

MOV 

0001 1 0 
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5 
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Format: 

MOV.fmt fd, fs 

Description: 

The contents of the FPU register specified by fs are interpreted in the 
specified format and are copied into the FPU register specified by fd. 

The move operation is non-arithmetic; no IEEE 754 exceptions occur 
as a result of the instruction. 

This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit 0 of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 


T: StoreFPR(fd, fmt, ValueFPR(fs, fmt)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
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MTC1 


Move To FPU 
(Coprocessor 1) 


MTC1 


31 26 

25 21 

20 16 

15 11 

10 0 

COP1 

MT 

rt 

fs 

0 

01 0001 

00100 



000 0000 0000 

6 

5 

5 

5 
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Format: 

MTC1 rt, fs 

Description: 

The contents of register rt are loaded into the FPU general register at 
location fs. 

The contents of floating-point register fs is undefined for time T of the 
instruction immediately following this load instruction. 

The FR bit of the Status register specifies whether all 32 registers of the 
R4600 are addressable. If FR equals zero, MTC1 loads either the high or 
low half of the 16 even Floating-Point registers. If FR equals one, MTC1 
loads the low 32-bits of both even and odd Floating-Point registers. 

Operation: 


T: data 4- GPR[rt] 3 i,. 0 

T+1: ifSR 2 $=1then 

CPR[1 , fs] 4 - undefined 32 II data 
else if fs 0 =0 then 

CPR[1, fs 4 ^ II 0] 4 — CPR[1 , fs 4 II 0] 33 32 II data 

else 

CPR[1, fs 4 -1 II 0] 4 - data II CPR[1,fe 4i> i II 0] 31- 0 

endif 


Exceptions: 

Coprocessor unusable exception 
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MUL.fmt Floating-Point Multiply MUL.fmt 
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fs 

fd 
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Format: 

MUL.fmt fd, fs, ft 

Description: 

The contents of the floating-point registers specified by fs and ft are 
interpreted in the specified format and arithmetically multiplied. The 
result is rounded as if calculated to infinite precision and then rounded to 
the specified format according to the current rounding mode. The result 
is placed in the floating-point register specified by fd. 

This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit 0 of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 


T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) * ValueFPR(ft, fmt)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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NEG.fmt Floating-Point Negate NEG.fmt 
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5 
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Format: 

NEG.fmt fd, fs 

Description: 

The contents of the FPU register specified by/s Eire interpreted in the 
specified format and the arithmetic negation is taken (polarity of the sign- 
bit is changed). The result is placed in the FPU register specified by fd. 

The negate operation is arithmetic: an NaN operand signals invalid 
operation. 

This instruction is valid only for single- or double-precision floating- 
point formats. The operation is not defined if bit 0 of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Operation: 


T: StoreFPR(fd, fmt, Negate(ValueFPR(fs, fmt))) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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ROUND.L.fmt ROUND.L.fmt 

Fixed-Point Format 
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0 

00000 

fs 

fd 

ROUND. L 
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5 

5 
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Format: 

ROUND.L.fmt fd. fs 

Description: 

The contents of the floating-point register specified by/s are 
interpreted in the specified source format, Jmt, and arithmetically 
converted to the long fixed-point format. The result is placed in the 
floating-point register specified by/d. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to nearest/ even (0). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 s3 to 2 s3 - 1, the Invalid operation exception 
is raised. If the Invalid operation is not enabled then no exception is taken 
and 2 s3 -1 is returned. 

Operation: 


T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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ROUND.W.fmt Floating-Point ROUND.W.fmt 

Round to Single 
Fixed-Point Format 
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Format: 

ROUND.W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, font, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by/d. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round to the nearest/ even 
(RM = 0). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 0 of any 
register specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 -1, an Invalid operation 
exception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 -1 is returned. 

Operation: 


T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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SDC1 


Store Doubleword from FPU 
(Coprocessor 1) 


SDC1 


31 26 
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ft 

offset 
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Format: 

SDC1 ft, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. 

When FR = 0, the contents of registers ft and ft+1 from the floating- 
point coprocessor are stored at the memory location specified by the 
effective address. This instruction is not valid, and is undefined, when the 
least significant bit of ft is non-zero. 

When FR = 1, the 64-bit register ft is stored to the contents of the 
doubleword at the memory location specified by the effective address. The 
FR bit of the Status register (SR 26 ) specifies whether all 32 registers of the 
R4600 are addressable. When FR equals zero, this instruction is not 
defined if the least significant bit of ft is non-zero. If FR equals one, ft may 
specify either odd or even registers. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

Operation: 


T: vAddr <- (offset^) 16 I I offset 150 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
if SR 26 = 1 

data <- CPR[1, ft] 

else 

data CPR[1, ft 41 I I 0) 

endif 

StoreMemory(uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 


Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SQRT.fmt 'square Root’ SQRT.fmt 
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Format: 

SQRT.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified format and the positive arithmetic square root 
is taken. The result is rounded as if calculated to infinite precision and 
then rounded to the specified format, according to the current rounding 
mode. If the value of fs corresponds to -0, the result will be -0. The result 
is placed in the floating-point register specified by fd. 

This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit 0 of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 


T: StoreFPR(fd, fmt, SquareRoot(ValueFPR(fs, fmt))) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
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SUB.fmt Floating-Point Subtract SUB.fmt 
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Format: 

SUB.fmt fd. fs, ft 

Description: 

The contents of the floating-point registers specified by fs and ft are 
interpreted in the specified format and arithmetically subtracted. The 
result is rounded as if calculated to infinite precision and then rounded to 
the specified format, according to the current rounding mode. The result 
is placed in the floating-point register specified by fd. 

This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit 0 of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 


T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) - ValueFPR(ft, fmt)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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SWC1 


Store Word from FPU 
(Coprocessor 1) 


SWC1 
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ft 

offset 
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Format: 

SWC1 ft, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. Hie contents of 
register ft from the floating-point coprocessor are stored at the memory 
location specified by the effective address. 

The FR bit of the Status register specifies whether all 64-bit floating- 
point registers are addressable. 

If FR = 0, SWC1 stores either the high or low half of the 16 even 
floating-point registers. 

If FR = 1, SWC1 stores the low 32-bits of both even and odd floating- 
point registers. 

If either of the two least-significant bits of the effective address are non- 
zero, an address error exception occurs. 

Operation: 


T: vAddr <- ((offset 15 ) 48 II offset 15 0 ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i ..3 I I (pAddr 2 ..o xor (ReverseEndian II 0 2 )) 
byte <- vAddr 2i .o xor (BigEndianCPU II 0 2 ) 
if SR 26 = 1 then 

data <- CPR[1, ft] 63 -8* byt e..o " O 8 *^ 
else if ft 0 =0 then 

data 4- CPR[1, ft 4 .. 1 II 0] 63 . 8 * byte o II O 8 *^ 6 

else 

data 4- O 32 ' 8 *^ 6 II CPR[1 , ft 4 .. 1 II 0] 63 ..32-8*byte 

endif 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 


Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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TRUNC.L.fmt £££%, TRUNC.L.fmt 

Fixed-Point Format 
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Format: 

TRUNC.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are 
interpreted in the specified source format, fmt, and arithmetically 
converted to the single fixed-point format. The result is placed in the 
floating-point register specified by/d. 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round toward zero (1). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 s3 to 2 s3 - 1, the Invalid operation exception is 
raised. If the Invalid operation is not enabled then no exception is taken 
and 2 63 -l is returned. 

Operation: 


T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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TRUNC.W.fmt TRUNC.W.fmt 

Fixed-Point Format 
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Format: 

TRUNC.W.fmt fd, fs 

Description: 

The contents of the FPU register specified by fs are interpreted in the 
specified source format font and arithmetically converted to the single 
fixed-point format. The result is placed in the FPU register specified by fcL 

Regardless of the setting of the current rounding mode, the conversion 
is rounded as if the current rounding mode is round toward zero (RM = 1). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 0 of any 
register specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 -1, an Invalid operation 
exception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 -1 is returned. 

Operation: 


T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 


Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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FPU Instruction Opcode Bit Encoding 



Key to Table: 

y Operation codes marked with a gamma cause a reserved instruction exception. They are 
reserved for future versions of the architecture. 

5 Operation codes marked with a delta cause unimplemented operation exceptions in all 
current implementations and are reserved for future versions of the architecture. 

T| Valid when 64-bit operand opcodes are enabled. 


Figure B.3 Bit Encoding for FPU Instructions 
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Introduction 

This appendix lists cycle operation counts and caveats for R4600/ 
R4700 cache operations timing. 

Caveats About Cache Operations 

1 . All cycle counts are in processor cycles. 

2. All cache ops have lower priority than cache misses, write backs and 
external requests. If the write back buffer contains unwritten data when 
a cache op is executed, the write back buffer will be retired before the cache 
op is begun. 

If an instruction cache miss occurs at the same time as a cache op is 
executed, the instruction cache miss will be handled first. Cache ops are 
mutually exclusive with respect to data cache misses. External requests 
will be completed before beginning a cache op. 

3. For all data cache ops the cache op machine waits for the store buffer 
and response buffer to empty before beginning the cache op. This can add 
3 cycles to any data cache op if there is data in the response buffer or store 
buffer. The response buffer contains data from the last data cache miss 
that has not yet been written to the data cache. The store buffer contains 
delayed store data waiting to be written to the data cache. 

4. Cache ops of the form xxxx_Writeback_xxxx may perform a write back 
which will fill the write back buffer. Write backs can affect subsequent 
cache ops, since they will stall until the write back buffer is written back 
to memory. Cache ops which fill the write back buffer are noted as 
(writeback) in the following tables. 

5. All cycle counts are best case assuming no interference from the 
mechanisms described above. 

Cache Operations Tables 

Table C.l and Table C.2 show data cache and instruction cache opera- 
tions information. A detailed explanation of the Fill_I equation follows 
Table C.2. 
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Code 1 

Name 

Number of Cycles 

0 

Index_Writeback_Invalidate_D 

10 cycles if the cache line is clean. 

12 cycles if the cache line is dirty 
(Writeback). 

1 

Index _Load__TagJD 

7 cycles. 

2 

Index_Store_Tag_D 

8 cycles. 

3 

Create_Dirty_Exclusive_D 

10 cycles for a cache hit. 

13 cycles for a cache miss if the cache 
line is clean. 

15 cycles for a cache miss if the cache 
line is dirty (Writeback). 

4 

HitJnvalidate_D 

7 cycles for a cache miss. 

9 cycles for a cache hit. 

5 

Hit_Writeback_I nvalidate_D 

7 cycles for a cache miss. 

12 cycles for a cache hit if the cache 
line is clean. 

14 cycles for a cache hit if the cache 
line is dirty (Writeback). 

7 

Hit_Writeback_D 

7 cycles for a cache miss. 

10 cycles for a cache hit if the cache 
line is clean. 

14 cycles for a cache hit if the cache 
line is dirty (Writeback). 

Note: 

1 Code number corresponds to the code column of the CACHE instruction in Appendix A 


Table C.l Primary Data Cache Operations 
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Code 1 

Name 

Number of Cycles 

0 

IndexJnvalidateJ 

7 cycles. 

1 

Index_Load_Tag_I 

7 cycles. 

2 

Index_Store_Tag_I 

8 cycles. 

3 

n/a 

n/a 

4 

HitJnvalidateJ 

7 cycles for a cache miss. 

9 cycles for a cache hit. 

5 

FillJ 

Cycle number must be calculated based on the sys- 
tem response to a memory access, because FillJ 
causes an instruction cache refill from memory. 

This equation yields the number of processor cycles 
for a Fill J cache op: 2 

Number of cycles for a Fill I CacheOp= 10 + {0 
- (SYSDIV - 1)} + (2 x SYSDIV) + 

(ML x SYSDIV) + (Dx SYSDIV) 3 

6 

Hit_Writeback_I 

7 cycles for a cache miss. 

20 cycles for a cache hit (Writeback). 

Note: 

1 Code number corresponds to the code column of the CACHE instruction in Appendix A. 
2 For definitions and discussion of the FillJ equation variables refer to the subsection 
“Details of the Fill_I Equation,” which follows this table. 

^he term {0 - (SYSDIV - 1) has a value between 0 and (SYSDIV - 1), depending on the 
alignment of the execution of the cache op with the system clock. 


Table C.2 Primary Instruction Cache Operations 


Details on the Fill_I Equation 

These are the definitions for the Hit_Writeback_I equation in Table C.2: 

SYSDIV: Number of processor cycles per system cycle; ranges from 
2 - 8 . 

ML: Number of system cycles of memory latency, defined as 
the number of cycles the SysAD bus is driven by the 
external agent before the first double word of data 
appears. 

D: Number of system cycles required to return the block of 
data, defined as the number of cycles beginning when the 
first double word of data appears on the SysAD bus and 
ending when the last double word of data appears on the 
SysAD bus, inclusive. 
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The R4600/R4700 provides a means to reduce the amount of power 
consumed by the internal core when the CPU would otherwise not be 
performing any useful operations. This is known as “Standby Mode” and 
is discussed in this appendix. 

Entering Standby Mode 

To enter Standby Mode, first execute the WAIT instruction. When the 
WAIT instruction finishes the W pipe-stage, if the SysAD bus is currently 
idle, the internal clocks will shut down, thus freezing the pipeline. The 
PLL, internal timer, some of the input pin clocks (Int[5:0]*, NMI*, 
ExtRqst*, Reset* and ColdReset*) and the output clocks (TClock[l:0], 
RClock[l:0], SyncOut, ModeClock and MasterOut) will continue to run. 
If the conditions are not correct when the WAIT instruction finishes the W 
pipe-stage (i.e., the SysAD bus is not idle), the WAIT is treated as a NOP. 

Once the CPU is in Standby Mode, any interrupt, including ExtRqst* or 
Reset*, will cause the CPU to exit Standby Mode. 
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This appendix identifies the R4600 and R4700 Coprocessor 0 hazards. 
In Table E. 1 the number of instructions required between instruction A 
(which places a value in a CPO register) and instruction B (which uses the 
same register as a source) is computed using the following formula: 

(destination stage of A) - (source stage of B) - 1 


Operation 

SOURCE 

Name Stage 

DESTINATION 

Name Stage 

MTCO 

gprrt 

2(A) 

cpr rd 

4(W)a 

MFCO 

cpr rd 

2(A) 

gprrt 

4(W)a 

TLBR 

Index, TLB 

2(A) 

PageMask, 

EntryHi, EntryLoO, EntryLol 

U 

TLBWI 

TLBWR 

Index or Random, PageMask, 
EntryHi, Entry LoO, EntryLol 

2(A) 

TLB 

u 

TLBP 

PageMask, EntryHi 

2(A) 

Index 

4 (W) 

ERET 

EPC or ErrorEPC, 

Status. ERL 

2(A) 

Status.EXL, Status.ERL 

4(W)y 

LLbit 

4(W) 

CACHE Index Load 

Tag 


■ 



CACHE Index Store 

Tag 

TagLo, TagHi, ECC 

g| 


Instruction fetch 

EntryHi. ASID, Status.KSU, Sta- 
tus. RE, Config.KOC, TLB 

jgj 

Status.ERL, Status.EXL 

0(I)Y 

Instruction fetch 
exception 


EPC, Status, Cause 

4(W) 

BadVAddr, Context, EntryHi 

1(1)5 

Coprocessor usable 
test 

Status. CU, Status.KSU, Sta- 
tus.EXL, Status.ERL 

1(R) 


Interrupt 

Cause.IP, Status.IM, Status.IE, 
Status.EXL, Status.ERL 


Load/Store 

EntryHi. ASID, Status.KSU, Sta- 
tus. RE, Status.ERL, Status.EXL 
Config.KOC, TLB 

2(A) 

Load/Store exception 


■ 

EPC, Status, Cause, Bad- 
VAddr, Context, EntryHi 

4(W) 


Notes: 

a There must be at least one instruction between a MTCO and a MFCO. 
p TLBW_ instructions will cause a one cycle slip. 

y Instructions fetches following an ERET will see a change in EXL or ERL in Stage 2 of the ERET in anticipation 
of the completion of the ERET. If the ERET does not complete, these instructions are killed before they commit 
changes in state other than noted by d. The pipestage corresponding to the stage field is given in parentheses. 


Table B. 1 Coprocessor O Hazards 
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Certain combinations of instructions are not permitted because the 
results of executing such combinations are unpredictable in the face of 
the events such as pipeline delays, cache misses, interrupts, and excep- 
tions. 

Most hazards result from instructions modifying and reading state in 
different pipeline stages. Such hazards are defined between pairs of 
instructions, not on a single instruction in isolation. Other hazards are 
associated with restartability of instructions in the presence of exceptions. 
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